Monday, November 3, 2008

Classifier comparison - Cross validation

Comparing the accuracy is often used in order to select the most interesting classifier. To do this, we must therefore produce a reliable error rate estimation.

Most of the time, we use a test set, a part of the dataset that not used during the learning phase. We obtain an unbiased measure of the error rate. But, this strategy is not feasible when we have a small dataset. Reserving a part of the dataset for the classifier evaluation penalizes the learning process.

In the context of small dataset, it is more judicious to use the resampling approaches such as cross validation. In this tutorial, how to implement the cross validation when we compare two classifiers.

Keywords: cross validation, resampling method, classifier comparison, classifier assessment, nearest neighbor, k-nn, decision tree, id3
Components: Supervised Learning, K-NN, Cross-validation
Tutorial: en_dr_comparer_spv_learning.pdf
Dataset: dr_heart.bdm