Sunday, December 30, 2012

Discriminant Correspondence Analysis

The aim of the canonical discriminant analysis is to explain the belonging to pre-defined groups of instances of a dataset. The groups are specified by a dependent categorical variable (class attribute, response variable); the explanatory variables (descriptors, predictors, independent variables) are all continuous. So, we obtain a small number of latent variables which enable to distinguish as far as possible the groups. These new features, called factors, are linear combinations of the initial descriptors. The process is a valuable dimensionality reduction technique. But its main drawback is that it cannot be directly applied when the descriptors are discrete. Even if the calculations are possible if we recode the variables using dummy variables for instance, the interpretation of the results - which is one of the main goals of the canonical discriminant analysis - is not really obvious.

In this tutorial, we present a variant of the discriminant analysis which is applicable to discrete descriptors due to Hervé Abdi (2007) . The approach is based on a transformation of the raw dataset in a kind of contingency table. The rows of the table correspond to the values of the target attribute; the columns are the indicators associated to the predictors’ values. Thus, the author suggests to use a correspondence analysis, on the one hand, in order to distinguish the groups, and on the other hand, to detect the relevant relationships between the values of the target attribute and those of the explanatory variables. The author called its approach "discriminant correspondence analysis" because it uses a correspondence analysis framework to solve a discriminant analysis problem.

In what follows, we detail the use of the discriminant correspondence analysis with Tanagra 1.4.48. We use the example described in the Hervé Abdi's paper. The goal is to explain the origin of 12 wines (3 possible regions) using 5 descriptors related to characteristics assessed by professional tasters. In a second part (section 3), we reproduce all the calculations with a program written for R.

Keywords: canonical discriminant analysis, descriptive discriminant analysis, correspondence analysis,  R software, xlsx package, ca package
Components: DISCRIMINANT CORRESPONDENCE ANALYSIS
Tutorial : Tutorial DCA
Dataset: french_wine_dca.zip
References:
H. Abdi, « Discriminant correspondence analysis », In N.J. Salkind (Ed.): Encyclopedia of Measurement and Statistics. Thousand Oaks (CA): Sage. pp. 270-275, 2007.