Saturday, October 31, 2009

Importing Weka file (.arff) into Sipina

WEKA is a very popular Data Mining tool. It supplies a very large of machine learning methods. WEKA can handle various files. But it has a native format (.ARFF) which is a text file with additional specifications.

The text file format is very simple and very easy to manipulate. But, on the other hand, the processing of this kind of file is often slow, slower than binary file format. When we deal with a moderate size file, the text file is enough efficient. The differences between the time processing are not discernible.

In this tutorial, we show how to import the ARFF file format into Sipina. We subdivide the dataset into train and test samples. Then we learn and we assess a decision tree.

Keywords: decision tree, c4.5, file format, data file importation, weka, arff
Tutorial: en_sipina_weka_file_format.pdf
Dataset: ionosphere.arff
References:
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutmann, I. Witten, "The Weka Data Mining Software: An Update", SIGKDD Explorations, Vol. 11, Issue 1, 2009.