Thursday, February 11, 2010

Supervised rule induction - Software comparison

Supervised rule induction methods play an important role in the Data Mining framework. Indeed, it provides an easy to understand classifier. A rule uses the following representation: "IF premise THEN conclusion" (e.g. IF an account problem is reported on a client THEN the credit is not accepted).

Among the rule induction methods, the "separate and conquer" approaches are very popular during the 90's. Curiously, they are less present today into proceedings or journals. More troublesome still, they are not implemented in commercial software. They are only available in free tools from the Machine Learning community. However, they have several advantages compared to other techniques.

In this tutorial, we describe first two separate and conquer algorithms for the rule induction process. Then, we show the behavior of the classification rules algorithms implemented in various tools such as Tanagra 1.4.34, Sipina Research 3.3, Weka 3.6.0, R 2.9.2 with the RWeka package, RapidMiner 4.6, or Orange 2.0b.

Keywords: rule induction, separate and conquer, top-down, CN2, decision tree
Composants : SAMPLING, DECISION LIST, RULE INDUCTION, TEST
Tutorial: en_Tanagra_Rule_Induction.pdf
Dataset: life_insurance.zip
References:
J. Furnkranz, "Separate-and-conquer Rule Learning", Artificial Intelligence Review, Volume 13, Issue 1, pages 3-54, 1999.
P. Clark, T. Niblett, "The CN2 Rule Induction Algorithm", Machine Learning, 3(4):261-283, 1989.
P. Clark, R. Boswell, "Rule Induction with CN2: Some recent improvements", Machine Learning - EWSL-91, pages 151-163, Springer Verlag, 1991.