Friday, October 31, 2008

Handling a weka file format (*.arff)

WEKA is a popular free data mining software. It includes a large number of methods, mainly articulated around supervised and unsupervised approaches.

WEKA has a proprietary file format (*. ARFF), which is a text format, with specifications for ad hoc variables documentation. Import ARFF file is easy, since we know how to handle a text file.

In this tutorial, we show how to import a ARFF file into TANAGRA. When there are missing values, very simple substitution strategies are used: the average for continuous variables, a new value is added for discrete variables.

Treatments can start normally, a diagram is automatically created. If we decide to save TDM format, the reference file is saved. At the next loading of diagram, importation of the ARFF file ARFF is done automatically without specific manipulation.

Keywords: WEKA, ARFF file format, data file importation
Components: DATASET
Tutorial: en_Tanagra_Handle_WEKA_File.pdf
Dataset: sick.arff