Friday, December 26, 2008

Logistic regression - Software comparison

Logistic regression is a popular supervised learning method.

There are several reasons for this. The theoretical foundation of the method is attractive. It is in line with the generalized regression. Thus the logistic regression is a well identified variant which one can implement according the kind of the dependent variable (class attribute). Their performance in prediction is comparable to the other approaches. Furthermore, it puts forward some indicators for the interpretation of the results. Among them, the famous odds-ratio enables to identify precisely the contribution of each predictor.

Logistic regression is available in many free tools. In this tutorial, we compare the implementation of this technique with Tanagra 1.4.27, R 2.7.2 (GLM command), Orange 1.0b2, Weka 3.5.6, and the package RWeka 0.3-13 for R. Beyond the comparison, this tutorial is also an opportunity to show how to achieve the succession of operations with these tools: importing an ARFF file (Weka file format); split the data into learning and test set; computing the predictive model on the learning set; testing the model on the test set; selecting the relevant variable using criterion in agreement with the logistic regression; evaluating again the performance of the simplified model.

Keywords: logistic regression, supervised learning, software comparison
Components: BINARY LOGISTIC REGRESSION, SUPERVISED LEARNING, TEST, DISCRETE SELECT EXAMPLES
Tutorial: en_Tanagra_Perfs_Reg_Logistique.pdf
Dataset: wave_2_classes_with_irrelevant_attributes.zip
References:
D. Garson, "Logistic Regression"
Wikipedia, "Logistic Regression"