Analysis of Text Classifier and the Improvement of KNN

摘要：

In the field of data mining, text classifier is a widely used tool. Naive Bayes Classifier (NBC), KNearest Neighbors (KNN) and Support Vector Machine (SVM) are very mature algorithms of them. This paper briefly introduces the principles of work of the three classifiers and the basic algorithm of KNN. Especially, we put forward two measures to improve the performance of KNN. These measures include the optimized algorithm to determine the unknown texts sort by calculating the means and the method to classify the text which doesnt belong to any sort in the train set. And these measures are validated by experimentations.

关键词： text classifier distance KNN

作者: Yuqing Zhang Kexian Wu Xin Chen

作者单位: School of Information Engineering, China University of Geosciences, Beijing, China

会议类型: 国际会议

会议名称: 2012 International Conference on Future Communication and Computer Technology(2012未来通信与计算机技术国际会议ICFCCT 2012)

会议地点: 哈尔滨

会议语种:英文

页码: 307-311

在线出版日期: 2012-05-19（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Analysis of Text Classifier and the Improvement of KNN