Improving feature selection process resistance to failures caused by curse-of-dimensionality effects

Somol, Petr; Grim, Jiří; Novovičová, Jana; Pudil, Pavel

About DML-CZ | FAQ | Conditions of Use | Math Archives | Contact Us

Previous | Up | Next

Article

Title:	Improving feature selection process resistance to failures caused by curse-of-dimensionality effects (English)
Author:	Somol, Petr
Author:	Grim, Jiří
Author:	Novovičová, Jana
Author:	Pudil, Pavel
Language:	English
Journal:	Kybernetika
ISSN:	0023-5954
Volume:	47
Issue:	3
Year:	2011
Pages:	401-425
Summary lang:	English
.
Category:	math
.
Summary:	The purpose of feature selection in machine learning is at least two-fold - saving measurement acquisition costs and reducing the negative effects of the curse of dimensionality with the aim to improve the accuracy of the models and the classification rate of classifiers with respect to previously unknown data. Yet it has been shown recently that the process of feature selection itself can be negatively affected by the very same curse of dimensionality - feature selection methods may easily over-fit or perform unstably. Such an outcome is unlikely to generalize well and the resulting recognition system may fail to deliver the expectable performance. In many tasks, it is therefore crucial to employ additional mechanisms of making the feature selection process more stable and resistant the curse of dimensionality effects. In this paper we discuss three different approaches to reducing this problem. We present an algorithmic extension applicable to various feature selection methods, capable of reducing excessive feature subset dependency not only on specific training data, but also on specific criterion function properties. Further, we discuss the concept of criteria ensembles, where various criteria vote about feature inclusion/removal and go on to provide a general definition of feature selection hybridization aimed at combining the advantages of dependent and independent criteria. The presented ideas are illustrated through examples and summarizing recommendations are given. (English)
Keyword:	feature selection
Keyword:	curse of dimensionality
Keyword:	over-fitting
Keyword:	stability
Keyword:	machine learning
Keyword:	dimensionality reduction
MSC:	62G05
MSC:	62H30
MSC:	68T10
idZBL:	Zbl 1218.62065
.
Date available:	2011-06-23T12:57:04Z
Last updated:	2013-09-22
Stable URL:	http://hdl.handle.net/10338.dmlcz/141593
.
Reference:	[1] Brown, G.: A new perspective for information theoretic feature selection.In: Proc. AISTATS ’09, JMLR: W&CP 5 (2009), pp. 49–56.
Reference:	[2] Chang, Ch.-Ch., Lin, Ch.-J.: LIBSVM: A Library for SVM, 2001.http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Reference:	[3] Das, S.: Filters, wrappers and a boosting-based hybrid for feature selection.In: Proc. 18th Int. Conf. on Machine Learning (ICML ’01), Morgan Kaufmann Publishers Inc. 2001, pp. 74–81.
Reference:	[4] Dash, M., Choi, K., Scheuermann, P., Liu, H.: Feature selection for clustering – a filter solution.In: Proc. 2002 IEEE Int. Conf. on Data Mining (ICDM ’02), Vol. 00, IEEE Comp. Soc. 2002, p. 115.
Reference:	[5] Devijver, P. A., Kittler, J.: Pattern Recognition: A Statistical Approach.Prentice Hall 1982. Zbl 0542.68071, MR 0692767
Reference:	[6] Dutta, D., Guha, R., Wild, D., Chen, T.: Ensemble feature selection: Consistent descriptor subsets for multiple qsar models.J. Chem. Inf. Model. 43 (2007), 3, pp. 989–997.
Reference:	[7] C. Emmanouilidis: Multiple-criteria genetic algorithms for feature selection in neuro-fuzzy modeling.In: Internat. Conf. on Neural Networks, Vol. 6, 1999, pp. 4387–4392.
Reference:	[8] Frank, A., Asuncion, A.: HASH(0x3185490).UCI Machine Learning Repository, 2010.
Reference:	[9] Gheyas, I. A., Smith, L. S.: Feature subset seleciton in large dimensionality domains.Pattern Recognition 43 (2010), 1, 5–13. 10.1016/j.patcog.2009.06.009
Reference:	[10] Glover, F. W., Kochenberger, G. A., eds.: Handbook of Metaheuristics.Internat. Ser. Operat. Research & Management Science 5, Springer 2003. Zbl 1058.90002, MR 1975894
Reference:	[11] Günter, S., Bunke, H.: An evaluation of ensemble methods in handwritten word recog.based on feature selection. In: Proc. ICPR ’04, IEEE Comp. Soc. 2004, pp. 388–392.
Reference:	[12] Guyon, I., Elisseeff, A.: An introduction to variable and feature selection.J. Mach. Learn. Res. 3 (2003), 1157–1182. Zbl 1102.68556
Reference:	[13] Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. A., eds.: Feature Extraction – Foundations and Applications.Studies in Fuzziness and Soft Comp. 207 Physica, Springer 2006. Zbl 1114.68059
Reference:	[14] Ho, Tin Kam: The random subspace method for constructing decision forests.IEEE Trans. PAMI 20 (1998), 8, 832–844. 10.1109/34.709601
Reference:	[15] Hussein, F., Ward, R., Kharma, N.: Genetic algorithms for feature selection and weighting, a review and study.In: Proc. 6th ICDAR, Vol. 00, IEEE Comp. Soc. 2001, pp. 1240–1244.
Reference:	[16] Jensen, R.: Performing feature selection with ACO. Studies Comput. Intelligence 34, Springer 2006, pp. 45–73.
Reference:	[17] : Special issue on variable and feature selection..J. Machine Learning Research. http://www.jmlr.org/papers/special/feature.html, 2003.
Reference:	[18] Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: A study on high-dimensional spaces.Knowledge Inform. Systems 12 (2007), 1, 95–116. 10.1007/s10115-006-0040-8
Reference:	[19] Kittler, J., Hatef, M., Duin, R. P. W., Matas, J.: On combining classifiers.IEEE Trans. PAMI 20 (1998), 3, 226–239. 10.1109/34.667881
Reference:	[20] Kohavi, R., John, G. H.: Wrappers for feature subset selection.Artificial Intelligence 97 (1997), 1–2, 273–324. Zbl 0904.68143, 10.1016/S0004-3702(97)00043-X
Reference:	[21] Kononenko, I.: Estimating attributes: Analysis and extensions of RELIEF.In: Proc. ECML-94, Springer 1994, pp. 171–182.
Reference:	[22] Kuncheva, L. I.: A stability index for feature selection.In: Proc. 25th IASTED Internat. Mul.-Conf. AIAP’07, ACTA Pr. 2007, pp. 390–395.
Reference:	[23] Lai, C., Reinders, M. J. T., Wessels, L.: Random subspace method for multivariate feature selection.Pattern Recognition Lett. 27 (2006), 10, 1067–1076. 10.1016/j.patrec.2005.12.018
Reference:	[24] Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining.Kluwer Academic Publishers 1998. Zbl 0908.68127
Reference:	[25] Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering.IEEE Trans. KDE 17 (2005), 4, 491–502.
Reference:	[26] Nakariyakul, S., Casasent, D. P.: Adaptive branch and bound algorithm for selecting optimal features.Pattern Recognition Lett. 28 (2007), 12, 1415–1427. 10.1016/j.patrec.2007.02.015
Reference:	[27] Nakariyakul, S., Casasent, D. P.: An improvement on floating search algorithms for feature subset selection.Pattern Recognition 42 (2009), 9, 1932–1940. Zbl 1178.68503, 10.1016/j.patcog.2008.11.018
Reference:	[28] Novovičová, J., Pudil, P., Kittler, J.: Divergence based feature selection for multimodal class densities.IEEE Trans. PAMI 18 (1996), 2, 218–223. 10.1109/34.481557
Reference:	[29] Pudil, P., Novovičová, J., Choakjarernwanit, N., Kittler, J.: Feature selection based on the approximation of class densities by finite mixtures of special type.Pattern Recognition 28 (1995), 9, 1389–1398. 10.1016/0031-3203(94)00009-B
Reference:	[30] Pudil, P., Novovičová, J., Kittler, J.: Floating search methods in feature selection.Pattern Recognition Lett. 15 (1994), 11, 1119–1125. 10.1016/0167-8655(94)90127-9
Reference:	[31] Raudys, Š. J.: Feature over-selection.In: Proc. S+SSPR, Lecture Notes in Comput. Sci. 4109, Springer 2006, pp. 622–631.
Reference:	[32] al., V. C. Raykar et: Bayesian multiple instance learning: automatic feature selection and inductive transfer.In: Proc. ICML ’08, ACM 2008, pp. 808–815.
Reference:	[33] Reunanen, J.: A pitfall in determining the optimal feature subset size.In: Proc. 4th Internat. Workshop on Pat. Rec. in Inf. Systs (PRIS 2004), pp. 176–185.
Reference:	[34] Reunanen, J.: Less biased measurement of feature selection benefits.In: Stat. and Optimiz. Perspectives Workshop, SLSFS, Lecture Notes in Comput. Sci. 3940, Springer 2006, pp. 198–208.
Reference:	[35] Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics.Bioinformatics 23 (2007), 19, 2507–2517. 10.1093/bioinformatics/btm344
Reference:	[36] Salappa, A., Doumpos, M., Zopounidis, C.: Feature selection algorithms in classification problems: An experimental evaluation.Optimiz. Methods Software 22 (2007), 1, 199–212. Zbl 1116.62069, MR 2272838, 10.1080/10556780600881910
Reference:	[37] Sebastiani, F.: Machine learning in automated text categorization.ACM Comput. Surveys 34 (2002), 1, 1–47. 10.1145/505282.505283
Reference:	[38] Sebban, M., Nock, R.: A hybrid filter/wrapper approach of feature selection using information theory.Pattern Recognition 35 (2002), 835–846. Zbl 0997.68115, 10.1016/S0031-3203(01)00084-X
Reference:	[39] Somol, P., Grim, J., Pudil, P.: Criteria ensembles in feature selection.In: Proc. MCS, Lecture Notes in Comput. Sci. 5519, Springer 2009, pp. 304–313.
Reference:	[40] Somol, P., Grim, J., Pudil, P.: The problem of fragile feature subset preference in feature selection methods and a proposal of algorithmic workaround.In: ICPR 2010. IEEE Comp. Soc. 2010.
Reference:	[41] Somol, P., Novovičová, J., Pudil, P.: Flexible-hybrid sequential floating search in statistical feature selection.In: Proc. S+SSPR, Lecture Notes in Comput. Sci. 4109, Springer 2006, pp. 632–639.
Reference:	[42] Somol, P., Novovičová, J.: Evaluating the stability of feature selectors that optimize feature subset cardinality.In: Proc. S+SSPR, Lecture Notes in Comput. Sci. 5342 Springer 2008, pp. 956–966.
Reference:	[43] Somol, P., Novovičová, J., Grim, J., Pudil, P.: Dynamic oscillating search algorithms for feature selection.In: ICPR 2008. IEEE Comp. Soc. 2008.
Reference:	[44] Somol, P., Novovičová, J., Pudil, P.: Improving sequential feature selection methods performance by means of hybridization.In: Proc. 6th IASTED Int. Conf. on Advances in Computer Science and Engrg. ACTA Press 2010.
Reference:	[45] Somol, P., Pudil, P.: Oscillating search algorithms for feature selection.In: ICPR 2000, IEEE Comp. Soc. 02 (2000), 406–409.
Reference:	[46] Somol, P., Pudil, P., Kittler, J.: Fast branch & bound algorithms for optimal feature selection.IEEE Trans. on PAMI 26 (2004), 7, 900–912. 10.1109/TPAMI.2004.28
Reference:	[47] Sun, Y.: Iterative RELIEF for feature weighting: Algorithms, theories, and applications.IEEE Trans. PAMI 29 (2007), 6, 1035–1051. 10.1109/TPAMI.2007.1093
Reference:	[48] al, M.-A. Tahir et: Simultaneous feature selection and feature weighting using hybrid tabu search/k-nearest neighbor classifier.Patt. Recognition Lett. 28 (2007), 4, 438–446. 10.1016/j.patrec.2006.08.016
Reference:	[49] Whitney, A. W.: A direct method of nonparametric measurement selection.IEEE Trans. Comput. 20 (1971), 9, 1100–1103. Zbl 0227.68047
Reference:	[50] Yang, Y., Pedersen, J. O.: A comparative study on feature selection in text categorization.In: Proc. 14th Internat. Conf. on Machine Learning (ICML ’97), Morgan Kaufmann 1997, pp. 412–420.
Reference:	[51] Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution.In: Proc. 20th Internat. Conf. on Machine Learning (ICML-03), Vol. 20, Morgan Kaufmann 2003, pp. 856–863.
Reference:	[52] Zhu, Z., Ong, Y. S., Dash, M.: Wrapper-filter feature selection algorithm using a memetic framework.IEEE Trans. Systems Man Cybernet., Part B 37 (2007), 1, 70. 10.1109/TSMCB.2006.883267
.

Files

Files	Size	Format	View
Kybernetika_47-2011-3_7.pdf	535.2Kb	application/pdf	View/Open

Back to standard record

Browse
- Collections
- Titles
- Authors
- MSC

About DML-CZ

Partner of

Article

Files

Search

Browse