Friday, October 31, 2008

Performance evaluation using a predefined test set

TANAGRA, ORANGE and WEKA: Comparison of learning algorithms using a predefined learning and test set.

Very often, we use the accuracy to compare the performances of the algorithms. We then select the method that is the most accurate. So that the comparison is rigorous, it is necessary that we use the same dataset in training and test phase.

We show in this tutorial, how to implement this process in three data mining software: TANAGRA, ORANGE and WEKA. We chose to compare the performances of a SVM (linear
kernel), a logistic regression and a decision tree.

Keywords: supervised learning, decision tree, svm, logistic regression, classifier assessment, train and test set, orange, weka
Components: Select examples, Supervised learning, Binary logistic regression, C-RT, C-SVC, Test
Tutorial: en_Tanagra_TOW_Predefined_Test_Set.pdf
Dataset: breast_tow.zip