Thursday, February 26, 2009

Predictive association rules

The algorithms for association rules extraction were originally developed to find the logical relation between variables with the same status. The predictive association rules instead seek to generate association of items that characterize a dependent attribute. We are in a supervised learning framework.

Basically, the algorithm is not really modified. Exploration is just limited to itemsets that include the dependent variable. The computation time is then reduced. Two components of Tanagra are dedicated to this task; these are SPV ASSOC RULE and SPV ASSOC TREE. They are available in the Association tab.Compared to conventional approaches, the components of Tanagra introduce additional specificity: we have the possibility to specify the class value ("dependent variable = value") that you wish to predict. The interest is to finely set the parameters of the algorithm, directly related to the characteristics of data. This is crucial for example when the prior probabilities of the dependent variable values are very different.

We had already submitted the component SPV TREE ASSOC elsewhere. But it was in the context of multivariate characterization of groups of individuals (from a clustering algorithm for instance). We compare it to the GROUP CHARACTERIZATION component. In this tutorial, we will compare the behavior of SPV ASSOC TREE and SPV ASSOC RULE during a prediction task. We will put forward their shared properties, the problems they can handle, and their differences. SPV ASSOC RULE, which supplies original rule interestingness measures ("test value" indicator), has the ability to simplify the rule base.

Keywords: predictive association rules, interestingness measure, rule base ranking, rule base simplification
Components: SPV ASSOC TREE, SPV ASSOC RULE
Tutorial: en_Tanagra_Predictive_AssocRules.pdf
Dataset: credit_assoc.xls