Thursday, March 3, 2016

Hyper-threading and solid-state drive

After more than 6 years of good and faithful service, I decided to change my computer. It must be said that the former (Intel Core 2 Quad Q9400 2.66 Ghz - 4 cores - running Windows 7 - 64 bit) began to make disturbing sounds. I am obliged to put music to cover the rumbling of the beast and be able to work quietly.

The choice of the new computer was another matter. I spent the age of the race to the power which is necessarily fruitless anyway, given the rapid evolution of PCs. Nevertheless, I was sensitive to two aspects that I could not evaluate previously: The hyper-threading  technology is effective in programming multithreaded algorithms of data mining? The use of temporary files to relieve the memory occupation takes advantage of SSD  disk technology?

The new PC runs under Windows 8.1 (I wrote the French version of this tutorial one year ago). The processor is a Core I7 4770S (3.1 Ghz). It has 4 physical cores but 8 logical cores with the hyper-threading technology. The system disk is a SSD. These characteristics allows evaluate to their influences on (1) the implementation of multithreaded version of the linear discriminant analysis described in a previous paper (“Load balanced multithreading for LDA”, September 2013), where the number of threads used can be specified by the user; (2) the use of temporary files for the induction of decision trees algorithm, which enables us to handle very large dataset (“Dealing with very large dataset in Sipina”, January 2010; up to 9,634,198 instances and 41 variables).

In this tutorial, we reproduce the two studies using the SIPINA software. Our goal is to evaluate the behavior of these solutions (multi-threaded implementation, copy of data into temporary files to alleviate the memory occupation) on our new machine which, due to its characteristics, should expressly take advantage of them.

Keywords:  hyper-threading, ssd disk, solid-state drive, multithread, multithreading, very large dataset, core i7, sipina, decision tree, linear discriminant analysis, lda
Tutorial: en_Tanagra_Hyperthreading.pdf
References:
Tanagra Tutorial, "Load balanced multithreading for LDA", September 2013.
Tanagra Tutorial, "Dealing with very large dataset in Sipina", January 2010.
Tanagra Tutorial, "Multithreading for decision tree induction", November 2010.