会议专题

A Study on Feature Selection Methods in Chinese Spam Filtering based on Maximum Entropy Model

Solving the Chinese spam filtering problem which is considered as a classification task is paid more and more attention nowadays. In this paper, the MEM (Maximum Entropy Model) is employed as the classifier, and the classification performance based on four different feature selection methods which are Document Frequency (DF), CHI statistics. Information Gain (IG) and Mutual Information (MI) is investigated. The results of the experiment on CCERT corpus demonstrate that DF and CHI prove to be the best and most stable feature selection method in Chinese spam filtering when MEM is applied. To our knowledge, this is the first time that the comparison of the performance of the four feature selection methods in MEM is made in Chinese spam filtering.

Chinese spam filtering maximum entropy model feature selection

Chao Chen Hanbing Wang Yitong Wang

College of Computer Science Sichuan University Cheng du, China School of Computer Science and Engineering University of Electronic Science and Technology of Chengd

国际会议

2010 International Conference on Future Information Technology(2010年未来信息技术国际会议 ICFIT 2010)

长沙

英文

712-716

2010-12-14(万方平台首次上网日期,不代表论文的发表时间)