Monday, April 9, 2012

Mining frequent itemsets

Searching regularities from dataset is the main goal of the data mining. They may have various representations. In the market basket analysis, we search the co occurrences of goods (items) i.e. the goods which are often purchased simultaneously. They are called “frequent itemset”. For instance, one result may be "milk and bread are purchased simultaneously in 10% of caddies".

Frequent itemset mining is often presented as the preceding step of the association rule learning algorithm. At the end of the process, we highlight the direction of the relation. We obtain rules. For instance, a rule may be "90% of the customers which buy milk and bread will purchase butter also". This kind of rule can be used in various manners. For instance, we can promote the sales of milk and bread in order to increase the sales of butter.

In fact, frequent itemsets provide also valuable information. Detecting the goods which are purchased simultaneously enables to understand the relation between them. It is a kind of variant of the clustering analysis. We search the items which come together. For instance, we can use this kind of information in order to reorganize the shelves of the store.

In this tutorial, we describe the use of the FREQUENT ITEMSETS component under Tanagra. It is based on the Borgelt's “apriori.exe” program. We use a very small dataset. It enables to everyone to reproduce manually the calculations. But, in a first time, we describe some definitions about the frequent itemset mining process.

Keywords: frequent itemsets, closed itemsets, maximal itemsets, generator itemsets, association rules, R software, arules package
Tutorial: en_Tanagra_Itemset_Mining.pdf
References :
C. Borgelt, "A priori - Association Rule Induction / Frequent Item Set Mining"
R. Lovin, "Mining Frequent Patterns"