An improved ambiguity measure feature selection for text categorization

摘要：

The high dimensionality of the text categorization raises big hurdles in applying many sophisticated learning algorithms to the text categorization. Feature selection, which reduces the number of features that represent documents, is an absolute requirement in text categorization. In this paper, we proposed a feature selection method, which improved the performance of the Ambiguity Measure feature selection. We compare the proposed method with four feature selections (Information Gain, Ambiguity Measure, Odd Ratios and Mutual Information) using two classification algorithms (Naive Bayes and Support Vector Machines) on three datasets (20-newgroups, Reuters-21578 and WebKB). The experiments show that the proposed method is significantly better than AM and MI, and achieves comparable performance with IG and OR.

关键词： feature selection text categorization dimensionality reduction

作者: Zhiying Liu Jieming Yang

作者单位: College of Information Engineering Northeast Dianli University Jilin, Jilin, China

会议类型: 国际会议

会议名称: 2012 4th International Conference on Intelligent Human-Machine Systems and Cybernetics 第4届智能人机系统与控制论国际会议 IHMSC 2012

会议地点: 南昌

会议语种:英文

页码: 220-223

在线出版日期: 2012-08-26（万方平台首次上网日期，不代表论文的发表时间）

会议专题

An improved ambiguity measure feature selection for text categorization