Friday, July 14, 2017

Kohonen map with R

This tutorial complements the course material concerning the Kohonen map or Self-organizing map (June 2017). In a first time, we try to highlight two important aspects of the approach: its ability to summarize the available information in a two-dimensional space; Its combination with a cluster analysis method for associating the topological representation (and the reading that one can do) to the interpretation of the groups obtained from the clustering algorithm. We use the R software and the “Kohonen” package (Wehrens et Buydens, 2007). In a second time, we carry out a comparative study of the quality of the partitioning with the one obtained with the K-means algorithm. We use an external evaluation i.e. we compare the clustering results with pre-established classes. This procedure is often used in research to evaluate the performance of clustering methods. It takes on its meaning when it is applied to artificial data where the true class membership is known. We use the K-Means and Kohonen-Som components of Tanagra.

This tutorial is based on the Shane Lynn's article on the R-bloggers website (Lynn, 2014). I completed it by introducing the intermediate calculations to better understand the meaning of the charts, and by conducting the comparative study.

Keywords: som, self organizing map, kohonen network, data visualization, dimensionality reduction, cluster analysis, clustering, hierarchical agglomerative clustering, hac, two-step clustering, R software, kohonen package, k-means, external evaluation, heatmaps
Components: KOHONEN-SOM
Tutorial: Kohonen map with R
Program and dataset: waveform - som
References:
Tanagra tutorial, "Self-organizing map (slides)", June 2017.
Tanagra Tutorial, "Self-organizing map (with Tanagra)", July 2009.