Friday, October 31, 2008

Learning classification tree - Software comparison

Learning decision tree with TANAGRA, ORANGE and WEKA. Estimation of the error rate using a cross-validation.

When we build a decision tree from a dataset, we much follow the following steps (not necessarily in the same order):
• Import the dataset in the software;
• Select the class attribute (TARGET) and the descriptors (INPUT);
• Choose the induction algorithm, according the implementation we can obtain slightly different results;
• Learning process and viewing the decision tree;
• Use cross-validation in order to obtain an honest error rate estimate.

In this tutorial, we show how to perfom these operations using various free software.

Keywords: supervised learning, decision tree, classification tree, classifier assessment, performance evaluation, resampling method, cross-validation, orange, weka
Components: Supervised learning, C-RT, Cross validation
Tutorial: en_Tanagra_TOW_Decision_Tree.pdf
Dataset: heart.txt
References:
R. Rakotomalala, " Arbres de décision ", Revue Modulad, 33, 163-187, 2005 (tutoriel_arbre_revue_modulad_33.pdf) (fr)
Wikipedia, "Decision tree learning"