Sunday, February 19, 2012

Checking missing values in Tanagra

Up to the 1.4.41 version, Tanagra does not handle missing values because it seems interesting to force the students, which are the main users of Tanagra, to think about and to propose the most appropriate solution in relation with the characteristics of their dataset and the goal of their analysis. Thus, Tanagra simply truncates the file to import from the first obstacle. This treatment often disconcerts the users, especially since no error message was sent.  They wondered why, whereas the conditions look right, the data were not properly loaded.

From Tanagra 1.4.42 version, the importation of the text file format (tab separator), of the XLS file format (Excel 97-2003), and the data transfer using the add-in for Excel (up to Excel 2010 ) and LibreOffice 3.5/OpenOffice 3.3, have been modified. Tanagra reads all rows of the base. But it skips the incomplete rows and / or with inconsistencies (e.g. a column contains numeric value whereas this is a discrete attribute). And above all, an explicit error message counts the number of deleted rows. Thus, the users are better informed.

In this tutorial, we show the management of missing data when we send the data from Excel to Tanagra using the add-in Tanagra.xla. Some cells are empty into the Excel data range. This example illustrates the new behavior of Tanagra. We would get the same behavior if we import directly the XLS file or if we imported the corresponding file into the TXT format.

Keywords: missing values, missing data, inconsistent values, text file format importation, excel file format importation, add-in, add-in, tanagra.xla
Components: DATASET, VIEW DATASET
Tutorial: en_Tanagra_Missing_Data_Checking.pdf
Dataset: ronflement_with_missing_empty.zip
References:
Wikipedia, "Listwise deletion".
D.C. Howell, "Treatment of missing data".