会议专题

An improved ambiguity measure feature selection for text categorization

The high dimensionality of the text categorization raises big hurdles in applying many sophisticated learning algorithms to the text categorization. Feature selection, which reduces the number of features that represent documents, is an absolute requirement in text categorization. In this paper, we proposed a feature selection method, which improved the performance of the Ambiguity Measure feature selection. We compare the proposed method with four feature selections (Information Gain, Ambiguity Measure, Odd Ratios and Mutual Information) using two classification algorithms (Naive Bayes and Support Vector Machines) on three datasets (20-newgroups, Reuters-21578 and WebKB). The experiments show that the proposed method is significantly better than AM and MI, and achieves comparable performance with IG and OR.

feature selection text categorization dimensionality reduction

Zhiying Liu Jieming Yang

College of Information Engineering Northeast Dianli University Jilin, Jilin, China

国际会议

2012 4th International Conference on Intelligent Human-Machine Systems and Cybernetics 第4届智能人机系统与控制论国际会议 IHMSC 2012

南昌

英文

220-223

2012-08-26(万方平台首次上网日期,不代表论文的发表时间)