Saturday, January 24, 2009

Sipina under Linux

In a recent tutorial, we show that it is possible to work with Tanagra under Linux using Wine. In this document, we implement Sipina (a data mining software intended to decision tree induction) with the same framework i.e. we install and use Sipina in a Linux environment. We use the Ubuntu distribution (French version 8.10). All the functionalities of Sipina are available, especially the interactive tools which allows us to explore deeply the subpopulation into a node of the tree.

In this tutorial, we implement the following steps: (1) Installing Sipina under Linux; (2) Launching the software; (3) Loading a dataset (text file with tab separator); (4) Choosing the class attribute and the predictive variables; (5) Partitioning the dataset in a train set and test set; (6) Computing the tree on the train set; (7) Evaluation the tree on the test set e.g. computing the confusion matrix, the error rate, etc.; (8) Exploring a subpopulation related to a node of the tree; (9) Launching a new analysis on a subpopulation related to a node of the tree.

We will describe quickly the various features of the software in this tutorial. They are already presented in several documents available online (http://eric.univ-lyon2.fr/~ricco/sipina.html, see the DOWNLOAD section). Our main goal here is to show the capabilities of Sipina under Linux.

We use the French Ubuntu 8.10 distribution; we have installed also Wine, a program which allows to Windows programs to run under Linux.

Keywords: linux, ubuntu, wine, sipina, decision tree
Tutorial: en_Sipina_under_Linux.pdf
References:
Ubuntu, http://www.ubuntu.com/
Wine, https://help.ubuntu.com/community/Wine