Saturday, June 25, 2016

Image classification with Knime

The aim of image mining is to extract valuable knowledge from image data. In the context of supervised image classification, we want to assign automatically a label to image from their visual content. The whole process is identical to the standard data mining process. We learn a classifier from a set of classified images. Then, we can apply the classifier to a new image in order to predict its class membership. The particularity is that we must extract a vector of numerical features from the image before to launch the machine learning algorithm, and before to apply the classifier in the deployment phase.

We deal with an image classification task in this tutorial. The goal is to detect automatically the images which contain a car. The main result is that, even if I have a basic knowledge about the image processing, I can lead the analysis with a facility which is symptomatic of the usability of Knime in this context.

Keywords: image mining, image classification, image processing, feature extraction, decision tree, random forest, knime
Tutorial: en_Tanagra_Image_Mining_Knime.pdf
Dataset and program (Knime archive): image mining tutorial
Knime Image Processing,
S. Agarwal, A. Awan, D. Roth, « UIUC Image Database for Car Detection » ;

Sunday, June 19, 2016

Gradient boosting (slides)

The "gradient boosting" is an ensemble method that generalizes boosting by providing the opportunity of use other loss functions ("standard" boosting uses implicitly an exponential loss function).

These slides show the ins and outs of the method. Gradient boosting for regression is detailed initially. The classification problem is presented thereafter.

The solutions implemented in the packages for R and Python are studied.

Keywords: boosting, regression tree, package gbm, package mboost, package xgboost, R, Python, package scikit-learn, sklearn
Slides: Gradient Boosting
R. Rakotomalala, "Bagging, Random Forest, Boosting", December 2015.
Natekin A., Knoll A., "Gradient boosting machines, a tutorial", in Frontiers in Neurorobotics, December 2013. 

Monday, June 13, 2016

Tanagra and Sipina add-ins for Excel 2016

The add-ins “tangra.xla” and “sipina.xla” are greatly involved in popularity of Tanagra and Sipina software applications. They incorporate menus dedicated to data mining in Excel. They implement a simple bridge between the data into the spreadsheet and Tanagra or Sipina.

I developed and tested the latest add-ins versions for Excel 2007 and 2010. I had access recently to Excel 2016. I checked the add-ins. The conclusion is that the tools work without a hitch.

Keywords: data importation, excel data file, add-in, add-on, xls, xlsx
Lien : en_Tanagra_Add_In_Excel_2016.pdf
Tanagra, "Tanagra add-in for Excel 2007 and 2010", August 2010.
Tanagra, "Sipina add-in for Excel 2007 and 2010", June 2016.

Sunday, June 12, 2016

Sipina add-in for Excel 2007 and 2010

SIPINA is a Data Mining Software which implements various supervised learning paradigms. This is an old tool but it is still used because this is the only free tool which provides fully functional interactive decision tree capabilities.

This tutorial briefly describes the installation and the use of the add-in "sipina.xla" into Excel 2007. The approach is easily generalized to Excel 2010. A similar document exists for Tanagra . It seemed to me nevertheless necessary to clarify the procedure, especially because several users have made the request. Other tutorials exist for earlier versions of Excel (1997-2003)  and for Calc (Libre Office and Open Office).

A new tutorial will come soon. It shows that the add-in operates properly also under Excel 2016.

Keywords: data importation, excel data file, add-in, add-on, xls, xlsx
Tutorial: en_sipina_excel_addin.pdf
Dataset: heart.xls
Tanagra, "Tanagra add-in for Office 2007 and Office 2010", august 2010.
Tanagra, "Tanagra and Sipina add-ins for Excel 2016", June 2016.