A FAST KNN ALGORITHM FOR TEXT CATEGORIZATION
The KNN algorithm applied to text categorization is a simple, valid and non-parameter method.The traditional KNN has a fatal defect that the time of similarity computing is huge.The practicality will be lost when the KNN algorithm is applied to text categorization with the high dimension and huge samples.In this paper, a method called TFKNN(Tree-Fast-K-Nearest-Neighbor) is presented, which can search the exact k nearest neighbors quickly.In the method, a SSR tree for searching K nearest neighbors is created, in which all child nodes of each non-leaf node are ranked according to the distances between their central points and the central point of their parent.Then the searching scope is reduced based on the tree.Subsequently, the time of similarity computing is decreased largely.
KNN Text categorization Similarity SSR-tree
YU WANG ZHENG-OU WANG
School of Mathematics and Computer Science, Hebei University, Baoding 071002, China;Institute of Sys Institute of Systems Engineering, Tianjin university, Tianjin 300072,China
国际会议
2007 International Conference on Machine Learning and Cybernetics(IEEE第六届机器学习与控制论国际会议)
香港
英文
3436-3441
2007-08-19(万方平台首次上网日期,不代表论文的发表时间)