Tuesday, December 23, 2008

Association rule mining - Software comparison

This document extends a previous tutorial dedicated to the comparison of various implementations of association rules mining (http://data-mining-tutorials.blogspot.com/2008/10/association-rule-learning.html). We had analyzed Tanagra, Orange and Weka. We extend here the comparison to R, RapidMiner and Knime.

We handle an attribute-value dataset. It is not the usual data format for the association rule mining where the "native" format is rather the transactional database. We see in this tutorial than some of tools can automatically recode the data. Others require an explicit transformation. Thus, we must find the right components and the correct sequence of treatments to produce the transactional data format. The process is not always easy according to the software.

The tools studied in this tutorial are: Tanagra 1.4.28, R 2.7.2 (arules package 0.6-6), Orange 1.0b2, RapidMiner Community Edition, Knime 1.3.5 and Weka 3.5.6. These programs load the data and perform the calculations in memory. When the size of the database increases, the real bottleneck is the memory available on our personal computer.

Keywords: association rule, frequent itemset
Tutorial: en_Tanagra_Assoc_Rules_Comparison.pdf
Dataset: credit-german.zip
R. Rakotomalala, « Règles d’association »
Wikipedia, "Association rule learning"