Thursday, July 6, 2017

Cluster analysis with R - HAC and K-Means

This tutorial describes a cluster analysis process. We deal with a set of cheeses (29 instances) characterized by their nutritional properties (9 variables). The aim is to determine groups of homogeneous cheeses in view of their properties.

We inspect and test two approaches using two procedures of the R software: the Hierarchical Agglomerative Clustering algorithm (hclust) ; and the K-Means algorithm (kmeans).

The data file "fromage.txt" comes from the teaching page of Marie Chavent from the University of Bordeaux. The excellent course materials and corrected exercises (commented R code) available on its website will complete this tutorial, which is intended firstly as a simple guide for the introduction of the R software in the context of the cluster analysis.

Keywords: R software, cluster analysis, clustering, hac, hierarchical agglomerative clustering, , k-means, fpc package, principal component analysis, PCA
Components: hclust, kmeans, kmeansruns
Turorial: hac and k-means with R 
Dataset and cource code: hac_kmeans_with_r.zip
References :
Marie Chavent, Teaching Page, University of Bordeaux.