Thursday, February 26, 2009

Predictive association rules

The algorithms for association rules extraction were originally developed to find the logical relation between variables with the same status. The predictive association rules instead seek to generate association of items that characterize a dependent attribute. We are in a supervised learning framework.

Basically, the algorithm is not really modified. Exploration is just limited to itemsets that include the dependent variable. The computation time is then reduced. Two components of Tanagra are dedicated to this task; these are SPV ASSOC RULE and SPV ASSOC TREE. They are available in the Association tab.Compared to conventional approaches, the components of Tanagra introduce additional specificity: we have the possibility to specify the class value ("dependent variable = value") that you wish to predict. The interest is to finely set the parameters of the algorithm, directly related to the characteristics of data. This is crucial for example when the prior probabilities of the dependent variable values are very different.

We had already submitted the component SPV TREE ASSOC elsewhere. But it was in the context of multivariate characterization of groups of individuals (from a clustering algorithm for instance). We compare it to the GROUP CHARACTERIZATION component. In this tutorial, we will compare the behavior of SPV ASSOC TREE and SPV ASSOC RULE during a prediction task. We will put forward their shared properties, the problems they can handle, and their differences. SPV ASSOC RULE, which supplies original rule interestingness measures ("test value" indicator), has the ability to simplify the rule base.

Keywords: predictive association rules, interestingness measure, rule base ranking, rule base simplification
Components: SPV ASSOC TREE, SPV ASSOC RULE
Tutorial: en_Tanagra_Predictive_AssocRules.pdf
Dataset: credit_assoc.xls

Sunday, February 22, 2009

Interestingness measures for association rules

This document outlines the measures to assess association rules proposed by the A PRIORI MR and SPV ASSOC RULE components. They come from studies reported in several publications of A. Morineau and R. Rakotomalala.

A measure characterizes the relevance of a rule. It can be used to rank them. It should also help to discern those that are "significantly interesting" from those who are irrelevant. This last point is totally prospective. There is no really satisfactory solution at this time.

The A PRIORI MR and the SPV ASSOC RULE components are experimental tools for the evaluation of the rules extracted by the association rule induction algorithm. They allow to evaluate the rules using measures based on the test value principle.

Keywords: association rules, interestingness measure, test value
Components: A PRIORI MR, SPV ASSOC RULE
Tutorial: en_Tanagra_APrioriMR_Measures.pdf
References:
R. Rakotomalala, A. Morineau, 2008. “The TVpercent principle for the counterexamples statistic”, in Statistical Implicative Analysis, Studies in Computational Intelligence Series, 127, 449-462, Springer, 2008 -- http://www.springerlink.com/content/g245317206950529/
Wikipedia, "Association rule learning"