Thursday, March 29, 2012

Tanagra - Version 1.4.43

A few bugs have been fixed and some new features added.

The computed contributions of individuals in PCA (PRINCIPAL COMPONENT ANALYSIS) have been corrected. It was not valid when we work on a subsample of our data file. This error has been reported by Mr. Gilbert Laffond.

The standardization of the factors after VARIMAX (FACTOR ROTATION) have been corrected so that their variance coincides with the sum of the squares of the correlations with the axes, and thus with the eigen value associated to the axis. This modification has been suggested by Mr. Gilbert Laffond.

During the calculation of the confidence interval of the PLS regression coefficients (PLS CONF. INTERVAL), an error may occur when the requested number of axes was upper than the number of predictor variables. It is now corrected. This error has been reported by Mr. Alain Morineau.

In some circumstances, an error may occur in FISHER FILTERING, especially when Tanagra is run under Wine for Linux. We introduce some additional checking. This error has been reported by Mr. Bastien Barchiési.

The checking of missing values is now optional. The performance can be preferred for the treatment of very large files. We find the performances of 1.4.41 and previous versions.

The "COMPONENT / COPY RESULTS" menu sends information in HTML format. It is now compatible with the spreadsheet Calc of Libre Office 3.5.1. It was operating with the Excel spreadsheet only before. Curiously, the copy to the OOCalc (Open Office spreadsheet) is not possible at the present time (Open Office 3.3.0).

Donwload page : setup

Friday, March 23, 2012

Sipina - Version 3.9

The add-on “SipinaLibrary.oxt” was added to the distribution. An additional menu is incorporated into spreadsheet OOCalc. It enables to launch SIPINA from a dataset (range of cells). The add-on operates with Open Office (tested for version 3.3.0) and Libre Office (version 3.5.1).

Note that a similar add-on exists for Excel (sipina.xla). It allows to make a connection between Sipina and Excel.

Keywords: sipina, OOCalc, open office, libre office, add-on, add-in
Sipina website: Sipina
Download: Setup file
References:
Tanagra - SIPINA add-in for Excel
Tanagra - Tanagra add-in for Excel 2007 and 2010
Open Office -  http://www.openoffice.org/
Libre Office - http://www.libreoffice.org/

Wednesday, March 21, 2012

RExcel, a bridge between Excel and R

Combining a specialized data mining tool with a spreadsheet is a very interesting idea. Most of the people know handle a spreadsheet such as Excel (but also LibreOffice Calc, Open Office Calc, Gnumeric, etc.). It is really popular because it is a very easy to use tool for data manipulation.

Many data mining tools can read XLS or XLSX file formats. But, it is even more interesting to implement a bridge between the data mining tools and Excel in a bidirectional way. So, we can lead easily the whole analysis by navigating between the tools: transforming the variables into Excel, performing the analysis into the data mining tool, and post-processing the results into Excel.

In this tutorial, we describe RExcel library for R. It sets a new menu into Excel. Thus, we can send a dataset to R on the one hand; retrieve dataset or more generally a vector or a matrix from R on the other hand. The tool is really easy to use.

Keywords: data importation, excel file format, xls, xlsx, addin, add-in, addon, add-on, multiple linear regression
Components: lm, stepAIC, predict
Tutorial: en_Tanagra_RExcel.pdf
Datasetventes_regression_rexcel.zip
References :
T. Baier, E. Neuwirth, "Powerful data analysis from inside your favorite application"

Sunday, March 4, 2012

PSPP, an alternative to SPSS

I spend a lot of time to analyze the available free statistical and data mining tools. There is not bad software, but some tools are more appropriate for some tasks. Thus, we must identify the one which is the best suited to our configuration. For that, we must know a large number of tools.

In this tutorial, we describe PSPP. It is presented as an alternative to the well-known SPSS: “PSPP is a program for statistical analysis of sampled data. It is a free replacement for the proprietary program SPSS, and appears very similar to it with a few exceptions”. Instead of to describe in detail each feature, the documentation is available on the website, we present some statistical techniques. We compare the results with those of Tanagra, R 2.13.2 and OpenStat (build 24/02/2012). This is also a way to validate them. If they provide different results, it means that there is a problem.

Keywords: pspp, R software, openstat, spss, descriptive statistics, t-test , welch test, comparison of means, comparison of variances, levene's test, chi-squared test, contingency table, cross tabs, analysis of variance, anova, multiple regression, roc curve, auc, area under curve
Components:  MORE UNIVARIATE CONT STAT, GROUP CHARACTERIZATION, CONTINGENCY CHI-SQUARE, LEVENE'S TEST, T-TEST, T-TEST UNEQUAL VARIANCE, PAIRED T-TEST, ONE-WAY ANOVA, MULTIPLE LINEAR REGRESSION, ROC CURVE
Tutorial: en_Tanagra_PSPP.pdf
Dataset: autos_pspp.zip
References:
GNU PSPP, http://www.gnu.org/software/pspp/
R Project for Statistical Computing, http://www.r-project.org/
OpenStat, http://www.statprograms4u.com/

Friday, March 2, 2012

Regression analysis with LazStats (OpenStat)

LazStat  is a statistical software which is developed by Bill Miller, the father of OpenStat, a well-know tool by statisticians since many years. These are tools of the highest quality. OpenStat is one of tools that I use when I want to validate my own implementations.

Several variants of OpenStat are available. In this tutorial, we study LazStat . It is a version programmed in Lazarus, a development environment which is very similar to Delphi. It is based on the Pascal language. Projects developed in Lazarus benefit to the "write once, compile anywhere" principle i.e. we write our program on an OS (e.g. Windows), but we can compile it on any OS as long as Lazarus and the compiler are available (e.g. Linux). This idea has been proposed by Borland with Kylix  some years ago. We could program a project for both Windows and Linux. But, unfortunately, Kylix has been canceled. It seems that the Lazarus is more mature. In addition, it enables us also to compile the same project for the 32 bit and 64 bit versions of an OS.

In this tutorial, we present some functionality of LazStats about regression analysis.

Keywords: linear regression, multiple regression, variable selection, forward, backward, stepwise, simultaneous regression
Tutorial: en_Tanagra_Regression_LazStats.pdf
Dataset: conso_vehicules_lazstats.txt
References :
LazStats - http://www.statprograms4u.com/
Lazarus  - http://www.lazarus.freepascal.org/