An improved method of term weighting for tezt classification
In text classification, term weighting methods design appropriate weights to the given terms to improve the text classification performance. Traditional algorithm of term weighting only considers about tf (term frequency), idf (inverse document frequency) and so on, and this approach simply thinks low frequency terms are important, high frequency terms are unimportant, so it designs higher weights to the rare terms frequently. In this paper, we present an effective term weighting approach to avoid the deficiency of the traditional approach, and make use of kNN classifiers to classify over widely-used benchmark data set Reuters-21578. The experimental results prove that the new approach can improve the accuracy of classification.
Tezt classification tf-idf term weighting kNN.
Hua Jiang Ping Li Xin Hu Shuyan Wang
School of Computer Science Northeast Normal University Changchun,Jilin Province,China
国际会议
上海
英文
294-298
2009-11-20(万方平台首次上网日期,不代表论文的发表时间)