Thursday, May 30, 2013

Sipina - Version 3.11

A new multithreaded version of linear discriminant analysis is added to Sipina 3.11. Compared to the previous, it presents two assets: (1) it is able to use all the resources available on the machines with multi-core processors or multiprocessor; (2) the load balancing is better. It requires however more amount of memory, the internal structures of calculation are duplicated M times (M is the number of threads).

A tutorial will come to compare the behavior of this approach with the previous version and the single-threaded implementation.

Keywords: linear discriminant analysis, multihreaded implementation, multithread
Sipina website: Sipina
Download: Setup file

Wednesday, May 29, 2013

Multithreading for linear discriminant analysis

Most of the modern personal computers have multicore CPU. This increases considerably their processing capabilities. Unfortunately, the popular free data mining tools does not really incorporate the multithreaded processing in the data mining algorithms they provide, aside from particular case such as ensemble methods or cross-validation process. The main reason of this scarcity is that it is impossible to define a generic framework whatever the mining method. We must study carefully the sequential algorithm, detect the opportunity of multithreading, and reorganize the calculations. We deal with several constraints: we must not increase excessively the memory occupation, we must use all the available cores, and we must balance the loads on the threads. Of course, the solution must be simple and operational on the usual personal computers.

Previously, we implemented a solution for the decision tree induction in Sipina 3.5. We studied also the solutions incorporated in Knime and RapidMiner. We show that the multithreaded programs outperform the single-thread version. This is wholly natural. But we observed also that there is not a unique solution. The internal organization of the multithread calculations influences the behavior and the performance of the program . In this tutorial, we present a multithreaded implementation for the linear discriminant analysis in SIPINA 3.10. The main property of the solution is that the calculation structure requires the same  amount of memory compared with the sequential program. We note that in some situations, the execution time can be decreased significantly.

The linear discriminant analysis is interesting in our context. We obtain a linear classifier which has a similar classification performance to the other linear method on the most of the real databases, especially compared with the logistic regression which is really popular (Saporta, 2006 – page 480; Hastie et al., 2013 – page 128). But the computation of the discriminant analysis is comparably really faster. We will see that this characteristic can be enhanced when we take advantage of the multicore architecture.

To better evaluate the improvements induced by our strategy, we compare our execution time with tools such as SAS 9.3 (proc discrim), R (lda of the MASS package) and Revolution R Community (an "optimized" version of R).

Keywords: sipina, multithreading, thread, multithreaded data mining, multithread processing, linear discriminant analysis, sas, proc discrim, R software, lda, MASS package
Components:  LINEAR DISCRIMINANT ANALYSIS
Tutorial: en_Tanagra_Sipina_LDA_Threads.pdf
Dataset: multithreaded_lda.zip
References:
Tanagra, "Multithreading for decision tree induction".
S. Rathburn, A. Wiesner, S. Basu, "STAT 505: Applied Multivariate Statistical Analysis", Lesson 10: Discriminant Analysis,  PennState, Online Learning: Department of Statistics.

Thursday, May 23, 2013

Sipina - Version 3.10

The linear discriminant analysis has been enhanced. All operations are performed in a single pass on the data.

A multithreaded version of the linear discriminant analysis has been added. It improves the execution speed by distributing calculations on any hearts (computer with a multi-core processor) or processors (multi-processor computer) available on the computer.

A tutorial will describe the behavior of these new implementations on some large databases.

Keywords: linear discriminant analysis, multihreaded implementation
Sipina website: Sipina
Download: Setup file