Sunday, May 16, 2010

Sipina Decision Graph Algorithm (case study)

SIPINA is a data mining tool. But it is also a machine learning method. It corresponds to an algorithm for the induction of decision graphs (see References, section 9). A decision graph is a generalization of a decision tree where we can merge any two terminal nodes of the graph, and not only the leaves issued from the same node.

The SIPINA method is only available under the version 2.5 of SIPINA data mining tool. This version has some drawbacks. Among others, it cannot handle large datasets (higher than 16.383 instances). But it is the only tool which implements the decision graphs algorithm. This is the main reason for which this version is available online to date. If we want to implement a decision tree algorithm such as C4.5 or CHAID, or if we want to create interactively a decision tree , it is more advantageous to use the research version (named also version 3.0). The research version is more powerful and it supplies much functionality for the data exploration.

In this tutorial, we show how to implement the Sipina decision graph algorithm with the Sipina software version 2.5. We want to predict the low birth weight of newborns from the characteristics of their mothers. We want foremost to show how to use this 2.5 version which is not well documented. We want also to point out the interest of the decision graphs when we treat a small dataset i.e. when the data fragmentation becomes a crucial problem.

Keywords: decision graphs, decision trees, sipina version 2.5
Tutorial: en_sipina_method.pdf
Dataset: low_birth_weight_v4.xls
Wikipedia, "Decision tree learning"
J. Oliver, Decision Graphs: An extension of Decision Trees, in Proc. of Int. Conf. on Artificial Intelligence and Statistics, 1993.
R. Rakotomalala, Graphes d'induction, PhD Dissertation, University Lyon 1, 1997 (URL:; in french).
D. Zighed, R. Rakotomalala, Graphes d'induction : Apprentissage et Data Mining, Hermes, 2000 (in French).