Sunday, April 22, 2012

Pentaho Data Integration - Kettle

The Pentaho BI Suite is an open source Business Intelligence suite with integrated reporting, dashboard, data mining, workflow and ETL capabilities (http://en.wikipedia.org/wiki/Pentaho).

In this tutorial, we talk about the Pentaho BI Suite Community Edition (CE) which is freely downloadable. More precisely, we present the Pentaho Data Integration (PDI-CE) , called also Kettle. We show briefly how to load a dataset and perform a simplistic data analysis. The main goal of this tutorial is to introduce a next one focused on the deployment of the models designed with Knime, Sipina or Weka by using PDI-CE.

This document is based on the 4.0.1 stable version of PDI-CE.

Keywords: ETL, pentaho data integration, community edition, kettle, BI, business intelligence, data importation, data transformation, data cleansing
Tutorial: PDI-CE
Dataset: titanic32x.csv.zip
References :
Pentaho, Pentaho Community