会议专题

An Efficient Semantic VSM based Email Categorization Method

Email categorization is challenging due to its sparse and noisy feature space. To address this problem, a novel semantic Vector Space Model (sVSM) using WordNet is proposed in this paper. The basic idea of sVSM is to select related semantic features that will increase the global information, and use them to enrich the semantic feature of an email. The proposed categorization method based on sVSM creates the sementic feature of an email category by both extracting terms of training email and enriching these terms with their conceptchains in WordNet. Next, tf~*iw~*iwf algorithm is used to adjust the weight of the semantic feature vector. Experimental evaluations show that the proposed categorization method categorizing emails better than other email categorization methods based on traditional VSM, Baysian and KNN. More experiments show the proposed categorization method yielding betteraccuracy for smaller training sets with highlighting the semantic feature during identifying an email category.

email categorization vector space model semantic vector

Zhao Lu Jianguo Ding

Department of Computer Science and Technology East China Normal University Shanghai, China Software Engineering Institute East China Normal University Shanghai, China

国际会议

The 2010 International Conference on Computer Application and System Modeling(2010计算机应用与系统建模国际会议 ICCASM 2010)

太原

英文

525-530

2010-10-22(万方平台首次上网日期,不代表论文的发表时间)