The 1st Workshop on Testing Technologies and Tools for Critical Industry Applications A Safe Approach to Shrink Email Sample Set while Keeping Balance between Spam and Normal

摘要：

To deal with any possible cases for training antispam machine learning models, it is crucial to design a safe way to shrink the size of training sample set via reducing redundancies with minimal information loss for classification as well as make distribution of samples balanced. Presently, there is no such solution to do so. In this paper, we propose a safe approach to address these problems and improve the quality of training email sample pool (set) for getting high quality machine learning models for better anti-spam engine with non-biased high spaM detection rates as well as low false positive rates.

关键词： anti-spam machine learning SVM

作者: Lili Diao Hao Wang

作者单位: Trend Micro Inc. Nanjing, China

会议类型: 国际会议

会议名称: 2009 Third IEEE International Conference on Secure Integration and Reliability Improvement SSIRI 2009(第三届IEEE安全软件集成及可信性改进国际会议)

会议地点: 上海

会议语种:英文

页码: 329-334

在线出版日期: 2009-07-08（万方平台首次上网日期，不代表论文的发表时间）

会议专题

The 1st Workshop on Testing Technologies and Tools for Critical Industry Applications A Safe Approach to Shrink Email Sample Set while Keeping Balance between Spam and Normal