Monday, November 3, 2008

ROC Curve for classifier comparison

ROC graphs enable to compare two or more supervised learning algorithms, they have properties that make them especially useful for domains with skewed class distribution and unequal classification error costs.

An ROC graph depicts relative trade-offs between true positives rate and false positives rate. It needs continuous output of classifier, an estimate of an instance’s class membership probabilities. In fact, a “score”, a numeric value that represents the degree to which an instance is a member of a class is sufficient.

AUC (Area Under Curve) reduces ROC performances to a single scalar value, which enables to compare several classifiers: this area is equivalent to the probability that the classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance.

In this tutorial, we compare linear discriminant analysis (LDA) and support vector machine (SVM) on a heart-diseases detection problem.

Keywords: roc curve, roc graphs, auc, area under curve, classifier performance comparison, linear discriminant analysis, svm, support vector machine, scoring
Components: Sampling, 0_1_Binarize, Supervised Learning, Scoring, Roc curve, SVM, Linear discriminant analysis
Tutorial: en_Tanagra_Roc_Curve.pdf
Dataset: dr_heart.bdm
T. Fawcet – « ROC Graphs : Notes and Practical Considerations of Researchers »
Wikipedia - "Receiver operating characteristic"