A Hybrid Algorithm for Text Classification based on Rough Set

摘要：

Nowadays, text classification has been one of the key subjects in intelligent information processing. Owing to the complex features of natural language, the feature space dimensions will be particularly high. How to improve the accuracy of text classification is an important and hard problem. As rough set is a useful tool to deal with uncertain information, a hybrid algorithm for text classification based on rough set is proposed in this paper. A set can be divided into positive region, negative region and boundary region by rough set. So, we can divide the documents into certain classes and doubt set using rough set firstly. In addition, based on the attributes importance degree theory in the informational view of rough set, the documents of the doubt set are classified further. We find that most of the documents can be classified with high accuracy in the first stage. Furthermore, the conditional independence assumption of naive Bayes is relaxed to some extent in the second stage. Simulation results on general data sets comparing with naive Byes, supported vector machine, and k-nearest neighbor illustrate the efficiency of this algorithm.

关键词： text classification SVM KNN rough set weighted naive Bayes

作者: Weibin Deng

作者单位: Key Lab of Electronic Commerce and Modern Logistics, Chongqing University of Posts and Telecommunications,Chongqing, China School of Information Science & Technology, Southwest Jiaotong University, Chengdu, China

会议类型: 国际会议

会议名称: 2011 3rd IEEE International Conference on Computer Research and Development(ICCRD 2011)(2011第三届计算机研究与发展国际会议)

会议地点: 上海

会议语种:英文

页码: 406-410

在线出版日期: 2011-03-11（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A Hybrid Algorithm for Text Classification based on Rough Set