Thursday, September 28, 2017

SVM: Support Vector Machine in R and Python

This tutorial completes the course material devoted to the Support Vector Machine approach (SVM).

It highlights two important dimensions of the method: the position of the support points and the definition of the decision boundaries in the representation space when we construct a linear separator; the difficulty to determine the “best” values of the parameters for a given problem.

We will use R (“e1071” package) and Python (“scikit-learn” package).

Keywords: svm, package e1071, logiciel R, logiciel Python, package scikit-learn, sklearn
Tutorial: SVM - Support Vector Machine
Dataset and programs:
Tanagra Tutorial, "Support Vector Machine", May 2017.
Tanagra Tutorial, "Implementing SVM on large dataset", July 2009.

Monday, September 11, 2017

Association rule learning with ARS

SIPINA is known for its decision tree induction algorithms. In fact, the distribution includes two other tools that are little known to the public: REGRESS, which is specialized in multiple linear regression, we described it in one of our tutorials ; and an association rules extraction tool, called simply Association Rule Software (ARS).

In this tutorial, I describe the use of the ARS tool. Its interactivity with Excel spreadsheet is its main advantage. We launch the software from Excel using the “sipina.xla” add-in. We can easily retrieve the rules in the spreadsheet. Then, we can explore them (the mined rules) using the Excel data handling capabilities. The ability to filter and sort rules according to different criteria is a great help in detecting interesting rules. This is a very important aspect because the profusion of rules can quickly confuse the data miner.

Keywords: ARS, association rule software, excel spreadsheet, filtering and sorting rules, interestingness measures
Tutorial: en_Tanagra_Association_Sipina.pdf
Tanagra Tutorial, "Association rule learning (slides)", August 2014.