Saturday, May 30, 2009

Understanding the "test value" criterion

The test value (VT) is a criterion often used in various components of TANAGRA. It is mainly used for the characterization of a group of observations according a continuous or categorical variable. The groups may be defined by categories from a discrete variable; they can also be computed by a machine learning algorithm (e.g. clustering, a node of a decision tree, etc.).

The principle is elementary: we compare the values of a descriptive statistic indicator computed on the whole sample and computed on sub sample related to the group. For a continuous variable, we compare the mean; for a discrete one, we compare the proportion.

Despite, or because of its simplicity, the VT is very useful. The formulation that we present in this tutorial is taken from the Lebart et al.’s book (2001). The VT is intensively used in some commercial software such as SPAD (http://eng.spad.eu/). It allows to characterize groups, but it can be used also to strengthen the interpretation of the factors extracted from a factorial analysis process.

In this tutorial, we emphasis the formulas used for both categorical and continuous variables. We put them in connection with the results provided by the GROUP CHARACTERIZATION component of TANAGRA.

Keywords: test value, group characterization, clustering, factorial analysis
Components: Group characterization
Tutorial: en_Tanagra_Comprendre_La_Valeur_Test.pdf
Dataset: heart_disease_male.xls
Reference:
L. Lebart, A. Morineau, M. Piron, « Statistique exploratoire multidimensionnelle », Dunod, 2000 ; pages 181 to 184.