会议专题

A FAST KNN ALGORITHM FOR TEXT CATEGORIZATION

The KNN algorithm applied to text categorization is a simple, valid and non-parameter method.The traditional KNN has a fatal defect that the time of similarity computing is huge.The practicality will be lost when the KNN algorithm is applied to text categorization with the high dimension and huge samples.In this paper, a method called TFKNN(Tree-Fast-K-Nearest-Neighbor) is presented, which can search the exact k nearest neighbors quickly.In the method, a SSR tree for searching K nearest neighbors is created, in which all child nodes of each non-leaf node are ranked according to the distances between their central points and the central point of their parent.Then the searching scope is reduced based on the tree.Subsequently, the time of similarity computing is decreased largely.

KNN Text categorization Similarity SSR-tree

YU WANG ZHENG-OU WANG

School of Mathematics and Computer Science, Hebei University, Baoding 071002, China;Institute of Sys Institute of Systems Engineering, Tianjin university, Tianjin 300072,China

国际会议

2007 International Conference on Machine Learning and Cybernetics(IEEE第六届机器学习与控制论国际会议)

香港

英文

3436-3441

2007-08-19(万方平台首次上网日期,不代表论文的发表时间)