The variable selection is a crucial step of the Data Mining process. In a supervised learning context, the detection of the relevant variables is overriding. Furthermore, according the Occam Razor principle, we need always to build the simplest model.
This tutorial describes the implementation of the component MIFS (Battiti, 1994) in a naive bayes learning context. It is also interesting because the selection phase is preceded by a feature transformation step where continuous descriptors are discretized using the MDLPC algorithm (Fayyad and Irani, 1992).
Keywords: sélection de variables, discrétisation, modèle d’indépendance conditionnelle
Components: Supervised learning, Naive Bayes, MDLPC, MIFS filtering, Cross validation
Tutorial: enFeature_Selection_For_Naive_Bayes.pdf
Dataset: iris.bdm
References:
R. Battiti, « Using the mutual information for selecting in supervised neural net learning », IEEE Transactions on Neural Networks, 5, pp.537-550, 1994.
U. Fayyad et K. Irani, « Multi-interval discretization of continuous-valued attributes for classification learning », in Proc. of IJCAI, pp.1022-1027, 1993.