Thursday, November 6, 2008

STEPDISC - Feature selection for LDA

In this tutorial, we use the stepwise discriminant analysis (STEPDISC) in order to determine relevant variables for a classification task.

STEPDISC (Stepwise Discriminant Analysis) is always associated to discriminant analysis because it relies on the same criterion i.e. the WILKS’ LAMBDA. So it is often presented such as a method especially intended for the discriminant analysis. In effect, it could be useful for various linear models because they are based upon the same representation bias (e.g. logistic regression, linear SVM, etc.). However, it is not really adapted to non-linear model such as nearest neighbor or multi layer perceptron.

We implement the FORAWRD and the BACKWARD strategies in TANAGRA. In the FORWARD approach, at each step, we determine which is the variable that really contributes to the discrimination between the groups. We add this variable if its contribution is significant. The process stops when there is no attribute to add in the model. In the BACKWARD approach, we begin with the complete model with all descriptors. We search which is the less relevant variable. We remove this variable if the removing does not significantly deteriorate the discrimination between groups. The process stops when there is no variable to remove.

Keywords: stepdisc, feature selection, linear discriminant analysis
Components: Supervised Learning, Linear discriminant analysis, Bootstrap, Stepdisc
Tutorial: en_Tanagra_Stepdisc.pdf
Dataset: sonar_for_stepdisc.xls
Reference: SAS/STAT User’s Guide, « The STEPDISC Procedure »