Tuesday, February 5, 2013

Exploratory Factor Analysis

PCA (Principal Component Analysis) is a dimension reduction technique which enables to obtain a synthetic description of a set of quantitative variables. It produces latent variables called principal components (or factors) which are linear combinations of the original variables. The number of useful components is much lower than to the number of original variables because these last ones are (more or less) correlated. PCA enables also to reveal the internal structure of the data because the components are constructed in a manner as to explain optimally the variance of the data.

PFA (Principal Factor Analysis)  is often confused with PCA. There has been significant controversy about the equivalence or otherwise of the two techniques. One of the point of view which enables to distinguish them is to consider that the factors from the PCA account the maximal amount of variance of the available variables, while those from PFA account only the common variance in the data. The latter seems more appropriate if the goal of the analysis is to produce latent variables which highlight the underlying relation between the original variables. The influence of the variables which are not related to the other should be excluded.

They are thus different due to the nature of the information they make use. But the nuance is not obvious. Especially as they are often grouped in the same tool into some popular software (e.g. “PROC FACTOR” into SAS; “ANALYZE / DATA REDUCTION / FACTOR” into SPSS; etc.). In addition, their outputs and their interpretation are very similar.

In this tutorial, we present three approaches: Principal Component Analysis – PCA; non iterative Principal Factor Analysis - PFA; non iterative Harris Component Analysis - Harris. We highlight the differences by comparing the matrix (correlation matrix for the PCA) used for the diagonalization process. We detail the steps of the calculations using a program for R. We check our results by comparing them to those of SAS (PROC FACTOR). Thereafter, we implement these methods with Tanagra, with R using the PSYCH package, and with SPSS.

Keywords: PCA, principal component analysis, correlation matrix, principal factor analysis, harris, reproduced correlation, residual correlation, partial correlation, varimax rotation, R software, psych package, principal( ), fa( ), proc factor, SAS, SPSS
Components: PRINCIPAL COMPONENT ANALYSIS, PRINCIPAL FACTOR ANALYSYS, HARRIS COMPONENT ANALYSIS, FACTOR ROTATION
Tutorial: en_Tanagra_Principal_Factor_Analysis.pdf
Datasets: beer_rnd.zip
References:
D. Suhr, "Principal Component Analysis vs. Exploratory Factor Analysis".
Wikipedia, "Factor Analysis".