The transfer between the Excel spreadsheet and Sipina was improved on the databases of moderate size (on large databases, several hundreds of thousands of rows, it is better to perform a direct importation of data file in text format TXT). The management of the decimal point has been improved. Now, the automatic processing is much faster than before.
The precision of the numerical cut points displayed in a decision tree becomes customizable. The users dispose of a new item into the menu "Tree Management".
Sipina website: Sipina
Download: Setup file
This Web log maintains an alternative layout of the tutorials about Tanagra. Each entry describes shortly the subject, it is followed by the link to the tutorial (pdf) and the dataset. The technical references (book, papers, website,...) are also provided. In some tutorials, we compare the results of Tanagra with other free software such as Knime, Orange, R software, Python, Sipina or Weka.
Monday, May 19, 2014
Saturday, May 3, 2014
Binary classification via regression (slides)
In these slides, we study the analogy between the linear discriminant analysis and the regression on an indicator variable when we deal with a binary classification problem.
The tests for the global significance of the model and the individual significance of the coefficients are equivalent. The coefficients are proportional, including the intercepts when we treat the balanced case. In the other case (unbalanced classes), an additional correction of the regression intercept is needed to obtain the linear discriminant analysis intercept.
For the multiclass classification, the equivalence between the regression and the linear discriminant analysis is no longer valid.
Keywords: supervised learning, linear discriminant analysis, multiple linear regression, R2, wilks lambda
Slides: Classification via regression
References :
R.O. Duda, P.E. Hart, D. Stork, « Pattern Classification », 2nd Edition, Wiley, 2000.
C.J. Huberty, S. Olejnik, « Applied MANOVA and Discriminant Analysis »,Wiley, 2006.
The tests for the global significance of the model and the individual significance of the coefficients are equivalent. The coefficients are proportional, including the intercepts when we treat the balanced case. In the other case (unbalanced classes), an additional correction of the regression intercept is needed to obtain the linear discriminant analysis intercept.
For the multiclass classification, the equivalence between the regression and the linear discriminant analysis is no longer valid.
Keywords: supervised learning, linear discriminant analysis, multiple linear regression, R2, wilks lambda
Slides: Classification via regression
References :
R.O. Duda, P.E. Hart, D. Stork, « Pattern Classification », 2nd Edition, Wiley, 2000.
C.J. Huberty, S. Olejnik, « Applied MANOVA and Discriminant Analysis »,Wiley, 2006.
Libellés :
Regression analysis,
Supervised Learning
Subscribe to:
Posts (Atom)