Quantcast
Channel: Statistics Help @ Talk Stats Forum - SAS
Viewing all articles
Browse latest Browse all 310

Large value for variation explained using PROC VARCLUS

$
0
0
Hi all,

I am relatively new to SAS, having been using it for the last month to do some statistical analysis on some small/medium dataset. I am now working with a much larger dataset (~40000 observations) with around 300 variables.

Of these 300 variables, more than half are not numerical (binary, categorical) so I have created another dataset with the same number of observations but all numerical variables. My conversion rules are as follows:

1. Binary variable: Y = 1, N = 0, missing data = 0
2. Categorical: 0, 1, 2 .. n where n is the number of categories. Missing data set to 0

I then run PROC VARCLUS on that data with the hope to be able to reduce the number of variables to make a better prediction model:

proc varclus data=worktable maxeigen=0.7 outtree=tree maxclusters=2;
var a-z;
run;

this give me a total variation explained ~ 30

increase to maxclusters=3 give me total variation explained ~ 50, increase all the way maxclusters=30 and total variation explained ~ 120

i then increase the maxclusters = 40 and it give me total variation explained ~ 130, this total variation explained always goes up. Reading across internet, I found that this values are normally around ~30,40 range and actually goes down if the maxclusters increases more than 10.

I am aware that each dataset is unique and different. However, the result I have is quite abnormal. Do you have any suggestion or explanation why the total variation explained I got is so large?

Thank you very much,

Thao

Viewing all articles
Browse latest Browse all 310

Trending Articles