Tanagra - Data Mining and Data Science Tutorials: Classifier comparison

Monday, November 3, 2008

Classifier comparison - Using a predefined test set

In order to evaluate a supervised learning algorithm, we often split the dataset into training set, which is used in the training process, and test set, which is used to obtain an unbiased error rate evaluation.

There are sampling components in TANAGRA, which enable to subdivide randomly the dataset, but in some circumstances, the user want use a predefined test set for their comparisons. It is especially usefull when we want to compare the performances of classifiers implemented in different softwares.

Keywords: supervised learning, classifier comparison, train and test set, error rate, confusion matrix, linear discriminant analysis, support vector machine, nearest neighbor classifier
Components: Select examples, Supervised learning, Linear discriminant analysis, SVM, K-NN
Tutorial: Classifier comparison
Dataset: sonar_with_test_set.xls