This course material presents ensemble methods: bagging, random forest and boosting. These approaches are based on the same guiding idea : a set of base classifiers learned from the an unique learning algorithm are fitted to different versions of the dataset.
For bagging and random forest, the models are fitted independently of bootstrap samples. Random Forest incorporates an additional mechanism in order to “decorrelate” the models which are necessarily decision trees.
Boosting works in a sequential fashion. A model at the step (t) is fitted to a weighted version of the sample in order to correct the error of the model learned at the preceding step (t-1).
Keywords: bagging, boosting, random forest, decision tree, rpart package, adabag package, randomforest package, R software
Slides: Bagging - Random Forest - Boosting
References :
Breiman L., "Bagging Predictors", Machine Learning, 26, p. 123-140, 1996.
Breiman L., "Random Forests", Machine Learning, 45, p. 5-32, 2001.
Freund Y., Schapire R., "Experiments with the new boosting algorithm", International Conference on Machine Learning, p. 148-156, 1996.
Zhu J., Zou H., Rosset S., Hastie T., "Multi-class AdaBoost", Statistics and Its Interface, 2, p. 349-360, 2009.