Previously we have studied the behavior of the factor analysis for mixed data (AFDM in French). This is a generalization of the principal component analysis which can handle both numeric and categorical variables . We can calculate, from a set of mixed variables, components which summarize the information available in the dataset. These components are a new set of numeric attributes. We can use them to perform the clustering analysis based on standard approaches for numeric values.

In this paper, we present a tandem analysis approach for the clustering of mixed data. First, we perform a factor analysis from the original set of variables, both numeric and categorical. Second, we launch the clustering algorithm on the most relevant factor scores. The main advantage is that we can use any type of clustering algorithm for numeric variables in the second phase. We expect also that by selecting a few number of components, we use the relevant information from the dataset, the results are more reliable .

We use

**Tanagra 1.4.49**and

**R (ade4 package)**in this case study.

**Keywords**: AFDM, FAMD, factor analysis for mixed data, clustering, cluster analysis, hac, hierarchical agglomerative clustering, R software, hclust, ade4 package, dudi.mix, cutree, groups description

**Components**: AFDM, HAC, GROUP CHARACTERIZATION, SCATTERPLOT

**Tutorial**: en_Tanagra_Clustering_Mixed_Data.pdf

**Dataset**: bank_clustering.zip

**References**:

Tanagra, "Factor Analysis for Mixed Data".

Jerome Pages, « Analyse Factorielle de Données Mixtes », Revue de Statistique Appliquee, tome 52, n°4, 2004 ; pages 93-111.