Saturday, June 27, 2009

The A PRIORI MR component

Association rule learning is a popular method for discovering interesting relations between variables in large databases. It was often used in market basket analysis domain e.g. if a customer buys onions and potatoes then he buys also beef. But, in fact, it can be implemented in various application areas where we want to discover the association between variables.

We were already described the association rule mining tools of Tanagra in several tutorials. The A PRIORI approach is certainly the most popular. But, despite its good properties, this method has a drawback: the number of obtained rules can be very high. The ability to underline the most interesting rules, those which are relevant, becomes a major challenge.

In this tutorial, we show to implement the A PRIORI MR component. It differentiates oneself from other by offering additional tools for exploring and assessing the mined rules: original measures based on the “test value” principle allow to evaluate differently the rules; the ability to copy the results into a spreadsheet allows a more detailed exploration of the rule base; by subdividing the dataset into train and test sets, we obtain a more reliable values of the interestingness measures of rules.

Keywords: association rule, a priori algorithm, interestingness measure, test value principle
Components: A PRIORI MR
Tutorial: en_Tanagra_APrioriMR_Component.pdf
Dataset: credit_assoc.xls
Reference:
Wikipedia, "Association rule learning"