Monday, May 19, 2014

Sipina - Version 3.12

The transfer between the Excel spreadsheet and Sipina was improved on the databases of moderate size (on large databases, several hundreds of thousands of rows, it is better to perform a direct importation of data file in text format TXT). The management of the decimal point has been improved. Now, the automatic processing is much faster than before.

The precision of the numerical cut points displayed in a decision tree becomes customizable. The users dispose of a new item into the menu "Tree Management".

Sipina website: Sipina
Download: Setup file

Saturday, May 3, 2014

Binary classification via regression (slides)

In these slides, we study the analogy between the linear discriminant analysis and the regression on an indicator variable when we deal with a binary classification problem.

The tests for the global significance of the model and the individual significance of the coefficients are equivalent. The coefficients are proportional, including the intercepts when we treat the balanced case. In the other case (unbalanced classes), an additional correction of the regression intercept is needed to obtain the linear discriminant analysis intercept.

For the multiclass classification, the equivalence between the regression and the linear discriminant analysis is no longer valid.

Keywords: supervised learning, linear discriminant analysis, multiple linear regression, R2, wilks lambda
Slides: Classification via regression
References :
R.O. Duda, P.E. Hart, D. Stork, « Pattern Classification », 2nd Edition, Wiley, 2000.
C.J. Huberty, S. Olejnik, « Applied MANOVA and Discriminant Analysis »,Wiley, 2006.