Tuesday, January 19, 2010

Tanagra - Version 1.4.35

CTP. The method of detection of the right size of the tree is modified for the "Clustering Tree" with post-pruning component (CTP). It relies both on the angle between half-lines at each point on the curve of decreasing the WSS (within-group sum of squares) on the growing sample and the decrease of the same indicator computed on the pruning sample. Compared to the previous implementation, it results in a smaller number of clusters.

Regression Tree. The previous modification is incorporated into the Regression Tree component which is a univariate version of CTP.

C-RT Regression Tree. A new regression tree component was added. It faithfully implements the technique described in the Breiman's and al. (1984) book, including the post-pruning part with the 1-SE Rule (Chapter 8, especially p. 226 about the formula for the variance of the MSE).

C-RT. The report of the induction of decision tree C-RT has been completed. Based on the last column of the post-pruning table, it becomes easier to choose the parameter x (in x-SE Rule) to arbitrarily define the size of the pruned tree.

Some tutorials will describe these various changes soon.