Thursday, April 30, 2009

Principal Component Analysis (PCA)

The PCA belongs to the factor analysis approaches. It is used to discover the underlying structure of a set of variables. It reduces attribute space from a larger number of variables to a smaller number of factors (dimensions) and as such is a "non-dependent" procedure i.e. it does not assume a dependent variable is specified.

In this tutorial, we show how to implement this approach and how to interpret the results with Tanagra. We use the AUTOS_ACP.XLS dataset from the state-of-the-art SAPORTA’s book. The interest of this dataset is that we can compare our results with those described in the book (pages 177 to 181). We simply show the sequence of operations and the reading of the results tables in this tutorial. About the detailed interpretation, it is best to refer to the book.

Keywords: factor analysis, principal component analysis, correlation circle
Components: Principal Component Analysis, View Dataset, Scatterplot with labels, View multiple scatterplot
Tutorial: en_Tanagra_Acp.pdf
Dataset: autos_acp.xls
References:
G. Saporta, " Probabilités, Analyse de données et Statistique ", Dunod, 2006 ; pages 177 to 181.
D. Garson, "Factor Analysis".
Statsoft Textbook, "Principal components and factor analysis".