Monday, November 5, 2012

Linear Discriminant Analysis - Tools comparison

Linear discriminant analysis is a popular method in domains of statistics, machine learning and pattern recognition. Indeed, it has interesting properties: it is rather fast on large bases; it can handle naturally multi-class problems (target attribute with more than 2 values); it generates a linear classifier linear, easy to interpret; it is robust and fairly stable, even applied on small databases; it has an embedded variable selection mechanism. Personally, I appreciate linear discriminant analysis because we can have multiple interpretations (probabilistic, geometric), and thus highlights various aspects of supervised learning.

In this tutorial, we highlight the similarities and the differences between the outputs of Tanagra, R (MASS and klaR packages), SAS, and SPSS software. The main conclusion is that, if the presentation is not always the same, ultimately we have exactly the same results. This is the most important.

Keywords: linear discriminant analysis, predictive discriminant analysis, canonical discriminant analysis, variable selection, feature selection, sas, stepdisc, candisc, R software, xlsx package, MASS package, lda, klaR package, greedy.wilks, confusion matrix, resubstitution error rate
Tutorial: en_Tanagra_LDA_Comparisons.pdf
Dataset: alcohol
References :
Wikipedia - "Linear Discriminant Analysis"