Saturday, June 10, 2017

Hierarchical agglomerative clustering (slides)

In data mining, cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters) (Wikipedia).

In this course material, we focus on the hierarchical agglomerative clustering (HAC). Beginning from the individuals which initially represents groups, the algorithms merge the groups in a bottom-up fashion until only the instances are gathered in only one group. The process is materialized by a dendrogram which allows to evaluate the nature of the solution and helps to determine the appropriate number of clusters.

Examples of analysis under R, Python and Tanagra are described.

Keywords: hac, cluster analysis, clustering, unsupervised learning, tandem analysis, two-step clustering, R software, hclust, python, scipy package
Components: HAC, K-MEANS
Slides: cah.pdf
References:
Wikipedia, "Cluster analysis".
Wikipedia, "Hierarchical clustering".