AN EFFICIENT TEXT CLASSIFICATION RULE EXTRACTION METHOD BASED ON χ VALUE AND ROUGH SET

摘要：

In this paper we propose a text classification rule extraction method, which is more efficient and more practical than existing similar methods. The definition of a proximate rule is first by given based on the characteristic of text classification rule extraction. Based on the χ values, the features of text set are selected and feature significance information is provided for the further feature selection. Then rough set is used to further reduce the features on the discrete decision table. Finally precise rules or proximate rules are extracted by using rough set theory. The method combines an improved χ2 value feature selection and rough set theory fully so as to avoid both feature reduction on a large scale decision table and the discretization of the decision table. The method greatly improves the efficiency and the practicability of the extracted text rule. Experimental results demonstrate the effectiveness of the method.

关键词： CHI value feature selection rough set text classification rule

作者: YE WANG MING-CHUN WANG

作者单位: Management School, Tianjin university, Tianjin 300072, China;Sports school, Tianjin Normal universit Mathematics Department, Tianjin university of Education and Technology, Tianjin 300222,China

会议类型: 国际会议

会议名称: 2006 International Conference on Machine Learning and Cybernetics(IEEE第五届机器学习与控制论坛)

会议地点: 大连

会议语种:英文

页码: 1552-1557

在线出版日期: 2006-08-13（万方平台首次上网日期，不代表论文的发表时间）

会议专题

AN EFFICIENT TEXT CLASSIFICATION RULE EXTRACTION METHOD BASED ON χ VALUE AND ROUGH SET