会议专题

Spam filtering based on PLS Feature Extraction

Along with the coming of network times,the research of spam filtering technology has been imperative under the situation. However,some specialties of mail dataset such as the data sparseness,high dimensionalities and multi-collinearity in mail content make great difference between spam filtering work and text classification work. In this paper,a new Partial Least Squares (PLS) feature extraction method on spare filtering is proposed,which could extract new much less latent semantic components than full features by linear combination,compress original data and be better solution for multi-collinearity. The experiments on CEAS 2006 benchmark datasets (Enron-Spam datasets) show that promising results are reported after evaluated by TREC spare track and the new method performs better than feature selection by x2 statistics.

垃圾邮件 邮件过滤 PLS 特征抽取

Peng-Ming Wang Ming-Wen Wang Guo-Bing Huang

School of Computer Information Engineering, Jiangxi Normal University, NanChang, 330022

国内会议

第三届全国信息检索与内容安全学术会议

苏州

英文

241-247

2007-11-01(万方平台首次上网日期,不代表论文的发表时间)