Saturday, November 8, 2008

Variable clustering (VARCLUS)

Variable clustering can be viewed like a clustering of the individuals where we would have transposed the dataset. But, instead of the utilization of the euclidean distance in order to compute the similarities between examples, we use the correlation coefficient (or the squared correlation coefficient).

Variable clustering may be useful in several situations. It can be used in order to detect the main dimensionality in the dataset; it may be used also in a feature selection process, in order to select the most relevant attributes for the subsequent analysis. The synthesized variable which represents a group, the main factor of PCA (Principal Component Analysis), may be used also.

Keywords: variable clustering, latent variables
Components: VARHCA, VARKMeans, VARCLUS
Tutorial: en_Tanagra_VarClus.pdf
Dataset: crime_dataset_from_DASL.xls
References:
E. Vigneau et E. Qannari, « Clustering of variables around latent components », Simulation and Computation, 32(4), 1131-1150, 2003.
SAS OnlineDoc – Version 8, « The VARCLUS Procedure ».