Tanagra - Data Mining and Data Science Tutorials: September 2014

Wednesday, September 24, 2014

Clustering variables (slides)

The aim of clustering variables is to divide a set of numeric variables into disjoint clusters (subset of variables). In these slides, we present an approach based on the concept of latent component. A subset of variables is summarized by a latent component which is the first factor from the principal component analysis. This is a kind of "centroid" variable which maximizes the sum of the squared correlation with the existing variables. Various clustering algorithms based on this idea are described: a hierarchical agglomerative algorithm; a top down approach; and an approach which is inspired by the k-means method.

Keywords: clustering, clustering variables, latent variable, latent component, clusters, groups, bottom-up, hierarchical agglomerative clustering, top down, varclus, k-means, pca, principal component analysis
Components (Tanagra): VARHCA, VARKMEANS, VARCLUS
Slides: Clustering variables
Tutorials:
Tanagra tutorials, "Variable clustering (VARCLUS)", 2008.

Tuesday, September 16, 2014

Single layer and multilayer perceptrons (slides)

Artificial neural networks are computational models inspired by an animal’s central nervous system (in particular brain) which is capable of machine learning as well as pattern recognition (Wikipedia).

In these slides, we present the single layer and multilayer perceptrons, which are devoted to supervised learning process. We describe the baseline of the approaches: the difference between the linear (single-layer) and non-linear (multilayer) classifiers; the representation power of the models; the learning algorithm (the Widrow-Hoff rule and the back propagation algorithm).

Keywords: artificial neural network, perceptron, single layer, SLP, multilayer, MLP, widrow-hoff rule, backpropagation algorithm, linear classifier, non linear classifier
Components (Tanagra): MULTILAYER PERCEPTRON
Slides: Single layer and multilayer perceptrons
Tutorials:
Tanagra tutorials, "Configuration of a multilayer perceptron", December 2017.
Tanagra tutorials, "Multilayer perceptron - Software comparison", 2008.

Saturday, September 13, 2014

Filter approaches for feature selection (slides)

In the supervised learning context, the filter approach for feature selection consists in the selection of the most appropriate variables for any subsequent machine learning algorithm used for the construction of the model.

The methods are mostly based on the correlation concept (in a large sense). They are interesting because they enable to handle quickly high-dimensional data sets. On the other hand, they are questionable because they do not take into account the characteristics of the model (e.g. linear, non-linear) that will be developed from the selected variables.

Keywords: feature selection, filter methods, embedded methods, wrapper methods
Components (Tanagra): CFS FILTERING, FCBF FILTERING, MIFS FILTERING, MODTREE FILTERING, FEATURE RANKING, FISHER FILTERING, RUNS FILTERING, STEPDISC
Slides: Filter methods
Tutorials:
Tanagra tutorials, "Filter methods for feature selection", 2010.
Tanagra tutorials, "Filter methods for feature selection (continuation)", 2010.