Wednesday, January 15, 2014

Scilab and R - Performance comparison

We have studied the Scilab tool in a data mining scheme in a previous tutorial . We noted that Scilab is well adapted for data mining. It is a credible alternative to R. But, we observed also that the available toolboxes for statistical processing and data mining are not very numerous compared to those of R. In this second tutorial, we evaluate the behavior of Scilab when we deal with a dataset with 500,000 instances and 22 attributes. We compare its performances with those of R. Two criteria are used: the memory occupation measured in the Windows task manager; the execution time at each step of the process.

It is not possible to obtain an exhaustive point of view. To delimit the scope of our study, we have specified a standard supervised learning scenario: loading a data file, building the predictive model with linear discriminant analysis approach, calculating the confusion matrix and resubstitution error rate. Of course, this study is incomplete. But it seems that Scilab is less efficient in the data management step. It is however quite efficient in the modeling step. This last assessment depends on the toolbox used.

Keywords: scilab, toolbox, nan, linear discriminant analysis, R software, sipina, tanagra
Tutorial: en_Tanagra_Scilab_R_Comparison.pdf
Dataset: waveform_scilab_r.zip
References:
Scilab - https://www.scilab.org/en
Michaël Baudin, "Introduction to Scilab (in French)", Developpez.com.