Tuesday, January 27, 2009

Performance comparison under Linux

The gain chart is an alternative to confusion matrix for the evaluation of a classifier. Its name is sometimes different according the tools (e.g. lift curve, lift chart, cumulative gain chart, etc.).

The main idea is to elaborate a graph where the X coordinates is the percent of the population and the Y coordinates is the percent of the positive value of the class attribute. The gain chart is used mainly in the marketing domain where we want to detect potential customers, but it can be used in other situations.

The construction of the gain chart is already outlined in a previous tutorial (see http://data-mining-tutorials.blogspot.com/2008/11/lift-curve-coil-challenge-2000.html). In this tutorial, we extend the description to other data mining tools (Knime, RapidMiner, Weka and Orange). The second originality of this tutorial is that we lead the experiment under Linux (French version of Ubuntu 8.10 – see http://data-mining-tutorials.blogspot.com/2009/01/tanagra-under-linux.html for the installation and the utilization of Tanagra under Linux). The third originality is that we handle a large dataset with 2,000,000 examples and 41 variables. It will be very interesting to study the behavior of these tools in this configuration, especially because our computer is not really powerful. We note that some tools failed the analysis on the complete dataset.

Keywords: scoring, linear discriminant analysis, naive bayes classifier, lift curve, gain chart, cumulative gain chart, knime, rapidminer, weka, orange
Tutorial: en_Tanagra_Gain_Chart.pdf
Dataset: dataset_gain_chart.zip

Saturday, January 24, 2009

Sipina under Linux

In a recent tutorial, we show that it is possible to work with Tanagra under Linux using Wine. In this document, we implement Sipina (a data mining software intended to decision tree induction) with the same framework i.e. we install and use Sipina in a Linux environment. We use the Ubuntu distribution (French version 8.10). All the functionalities of Sipina are available, especially the interactive tools which allows us to explore deeply the subpopulation into a node of the tree.

In this tutorial, we implement the following steps: (1) Installing Sipina under Linux; (2) Launching the software; (3) Loading a dataset (text file with tab separator); (4) Choosing the class attribute and the predictive variables; (5) Partitioning the dataset in a train set and test set; (6) Computing the tree on the train set; (7) Evaluation the tree on the test set e.g. computing the confusion matrix, the error rate, etc.; (8) Exploring a subpopulation related to a node of the tree; (9) Launching a new analysis on a subpopulation related to a node of the tree.

We will describe quickly the various features of the software in this tutorial. They are already presented in several documents available online (http://eric.univ-lyon2.fr/~ricco/sipina.html, see the DOWNLOAD section). Our main goal here is to show the capabilities of Sipina under Linux.

We use the French Ubuntu 8.10 distribution; we have installed also Wine, a program which allows to Windows programs to run under Linux.

Keywords: linux, ubuntu, wine, sipina, decision tree
Tutorial: en_Sipina_under_Linux.pdf
Ubuntu, http://www.ubuntu.com/
Wine, https://help.ubuntu.com/community/Wine

Tuesday, January 13, 2009

Tanagra under Linux

The users ask sometimes "Can I use Tanagra under Linux?" The answer is YES and NO.

NO, we cannot execute natively Tanagra under Linux. It is a 32-bits program for Windows.

But YES, we can run Tanagra under Linux using WINE, a famous Linux application which allows us to run Windows programs on Linux. We can then take all the advantages of Tanagra without asking any questions about compatibilities.

In this tutorial, we show how to install and run Tanagra under Ubuntu (a free of charge version of Linux) using WINE. We can fully use Tanagra in the Linux environment.

Keywords: linux, ubuntu, wine
Tutorial: en_Tanagra_under_Linux.pdf
Ubuntu, http://www.ubuntu.com/
Wine, https://help.ubuntu.com/community/Wine