In this tutorial, we use the K-Means algorithm. We assign each new instance to the group which is closest using the distance to the center of groups. The method is fair because the technique used to assign a group in the deployment phase is consistent with the learning algorithm. It is not true if we use another learning algorithm e.g. HAC (hierarchical agglomerative clustering) with de single linkage aggregation rule. The distance to the center of groups is inadequate in this context. Thus, the classification strategy must be consistent with the learning strategy.

All the descriptors are discrete in our dataset. The K-Means algorithm does not handle directly this kind of data. We must transform them. We use a multiple correspondence analysis algorithm.

In this tutorial, we compare the results of Tanagra 1.4.28 and R 2.7.2.

**Keywords**: data clustering, k-means, multiple correspondence analysis, factorial analysis, clusters interpretation, data exportation

**Components**: MULTIPLE CORRESPONDENCE ANALYSIS, K-MEANS, GROUP CHARACTERIZATION, CONTINGENCY CHI-SQUARE, EXPORT DATASET

**Tutorial**: en_Tanagra_KMeans_Deploiement.pdf

**Dataset**: banque_classif_deploiement.zip

**References**:

Wikipedia (en), « K-Means algorithm ».

F. Husson, S. Lê, J. Josse, J. Mazet, « FactoMineR – A package dedicated to Factor Analysis and Data Mining with R ».