An Improved KNN Text Categorization on Skew Sort Condition
KNN is one of most frequent used methods for text categorization. The feature high-dimension and skew of sort distribution will impact the performance of the classifier. An improved KNN based on skew sort condition is introduced in this paper for solving the problem that the big swatch sort with more texts is easy to be selected when conducting the K neighbor selection. Firstly, text feature selection is conducted by an improved information gain method for more efficient using the categorization distribution information in the sample training set. Then an improved KNN classifier based on the sort is used for categorization, which can solve the problem that big swatch sort is selected in training set. The experiment shows this method has improved the KNN classification performance.
KNN feature reduction feature selection text categorization
Liu Haifeng Liu Shousheng Su Zhan
Institute of Sciences PLA University of Science and Technology Nanjing, China
国际会议
太原
英文
182-186
2010-10-22(万方平台首次上网日期,不代表论文的发表时间)