Thursday, November 6, 2008

PLS Regression for Classification Task

PLS (Partial Least Squares Regression) Regression can be viewed as a multivariate regression framework where we want to predict the values of several target variables (Y1, Y2, …) from the values of several input variables (X1, X2, …).

Roughly speaking, the algorithm is the following: “The components of X are used to predict the scores on the Y components, and the predicted Y component scores are used to predict the actual values of the Y variables. In constructing the principal components of X, the PLS algorithm iteratively maximizes the strength of the relation of successive pairs of X and Y component scores by maximizing the covariance of each X-score with the Y variables. This strategy means that while the original X variables may be multicollinear, the X components used to predict Y will be orthogonal”.

The PLS Regression is initially defined for the prediction of continuous target variable. But it seems it can be useful in the supervised learning problem where we want to predict the values of discrete attributes. In this tutorial we propose a few variants of PLS Regression adapted to the prediction of discrete variable. The generic name "PLS-DA" (Partial Least Square Discriminant Analysis) is often used in the literature.

Keywords: pls regression, discriminant analysis, supervised learning
Components: C-PLS, PLS-DA, PLS-LDA
Tutorial: en_Tanagra_PLS_DA.pdf
Dataset: breast-cancer-pls-da.xls
References:
S. Chevallier, D. Bertrand, A. Kohler, P. Courcoux, « Application of PLS-DA in multivariate image analysis », in J. Chemometrics, 20 : 221-229, 2006.
Garson, « Partial Least Squares Regression (PLS) », http://www2.chass.ncsu.edu/garson/PA765/pls.htm