Monday, November 10, 2008

Decision tree and contextual descriptive statistics

SIPINA proposes some descriptive statistics functionalities. In itself, the information is not really exceptional; there is a large number of freeware which do that.

It becomes more interesting when we combine these tools with the decision tree. The exploratory phase is improved. Indeed, every node of the tree corresponds to a subpopulation. The variables which do not appear in the tree are not necessarily irrelevant. Perhaps, some of them were hided during the tree learning which selects the “best” variables. By computing contextual descriptive statistics, in connection with the each node, we better understand the prediction rules highlighted during the induction process.

Keywords: descriptive statistics, decision tree, interactive exploration
Tutorial: en_sipina_descriptive_statistics.pdf
Dataset: heart_disease_male.xls