会议专题

A Hybrid Algorithm for Text Classification based on Rough Set

Nowadays, text classification has been one of the key subjects in intelligent information processing. Owing to the complex features of natural language, the feature space dimensions will be particularly high. How to improve the accuracy of text classification is an important and hard problem. As rough set is a useful tool to deal with uncertain information, a hybrid algorithm for text classification based on rough set is proposed in this paper. A set can be divided into positive region, negative region and boundary region by rough set. So, we can divide the documents into certain classes and doubt set using rough set firstly. In addition, based on the attributes importance degree theory in the informational view of rough set, the documents of the doubt set are classified further. We find that most of the documents can be classified with high accuracy in the first stage. Furthermore, the conditional independence assumption of naive Bayes is relaxed to some extent in the second stage. Simulation results on general data sets comparing with naive Byes, supported vector machine, and k-nearest neighbor illustrate the efficiency of this algorithm.

text classification SVM KNN rough set weighted naive Bayes

Weibin Deng

Key Lab of Electronic Commerce and Modern Logistics, Chongqing University of Posts and Telecommunications,Chongqing, China School of Information Science & Technology, Southwest Jiaotong University, Chengdu, China

国际会议

2011 3rd IEEE International Conference on Computer Research and Development(ICCRD 2011)(2011第三届计算机研究与发展国际会议)

上海

英文

406-410

2011-03-11(万方平台首次上网日期,不代表论文的发表时间)