Using DBSCAN Clustering Algorithm in Spam Identifying
In the field of internet research, anti-spam mechanism has become a focus currently. The identification of spam plays an important role in current anti-spam mechanism. In order to identify spam efficiently, it usually needs to be able to identify similar emails, i.e. spam clustering. Using the present methods to cluster the emails, many similar emails will be clustered into several groups. For improving the accuracy of spam identification, we present a new clustering method which is based on the DBSCAN clustering algorithm and nilsimsa digest algorithm. Using this method, all emails identified similar artificially are clustered together. The result of the simulation shows that the clustering method based on DBSCAN and nilsimsa performs with higher clustering accuracy than the other clustering methods. From the simulation result, we can also conclude that the shape of the spam digest subspace is irregular.
DBSCAN cluster nilsimsa spam
Wu Ying Yang Kai Zhang Jianzhong
Department of Computer Science, Nankai University, Tianjin P.R.China
国际会议
上海
英文
398-402
2010-06-22(万方平台首次上网日期,不代表论文的发表时间)