An improved ambiguity measure feature selection for text categorization
The high dimensionality of the text categorization raises big hurdles in applying many sophisticated learning algorithms to the text categorization. Feature selection, which reduces the number of features that represent documents, is an absolute requirement in text categorization. In this paper, we proposed a feature selection method, which improved the performance of the Ambiguity Measure feature selection. We compare the proposed method with four feature selections (Information Gain, Ambiguity Measure, Odd Ratios and Mutual Information) using two classification algorithms (Naive Bayes and Support Vector Machines) on three datasets (20-newgroups, Reuters-21578 and WebKB). The experiments show that the proposed method is significantly better than AM and MI, and achieves comparable performance with IG and OR.
feature selection text categorization dimensionality reduction
Zhiying Liu Jieming Yang
College of Information Engineering Northeast Dianli University Jilin, Jilin, China
国际会议
南昌
英文
220-223
2012-08-26(万方平台首次上网日期,不代表论文的发表时间)