Thursday, November 6, 2008

PLS Regression

PLS (Partial Least Squares Regression) Regression can be viewed as a multivariate regression framework where we want to predict the values of several target variables (Y1, Y2, …) from the values of several input variables (X1, X2, …).

Roughly speaking, the algorithm is the following: “The components of X are used to predict the scores on the Y components, and the predicted Y component scores are used to predict the actual values of the Y variables. In constructing the principal components of X, the PLS algorithm iteratively maximizes the strength of the relation of successive pairs of X and Y component scores by maximizing the covariance of each X-score with the Y variables. This strategy means that while the original X variables may be multicollinear, the X components used to predict Y will be orthogonal”.

The dataset used correspond to 6 orange juices described by 16 physicochemical descriptors and evaluated by 96 judges [Source : Tenenhaus, M., Pagès, J., Ambroisine L. and & Guinot, C. (2005). PLS methodology for studying relationships between hedonic judgements and product characteristics. Food Quality an Preference. 16, 4, pp 315-325].

Keywords: pls regression, factorial analysis, multiple linear regression
Components: PLS Regression
Tutorial: en_Tanagra_PLS.pdf
Dataset: orange.bdm
References:
M. Tenenhaus, « La régression PLS – Théorie et pratique », Technip, 1998.S.
H. Abdi, "Partial Least Square Regression".
Garson, « Partial Least Squares Regression (PLS) », http://www2.chass.ncsu.edu/garson/PA765/pls.htm