Locally Adaptive Text Classification based k-nearest Neighbors

摘要：

Due to the exponential growth of documents on the Internet and the emergent need to organize them, the automated categorization of documents into predefined labels has received an ever-increased attention in the recent years. Among all these classifiers, K-Nearest Neighbors (KNNC) is a widely used classifier in text categorization community because of its simplicity and efficiency. However, KNNC still suffers from inductive biases or model misfits that result from its assumptions, such as the presumption that training data are evenly distributed among all categories. In this paper, we propose a new refinement strategy (LAKNNC) for the KNN Classifier, which adopts sumof- squared-error criterion to adaptively select the contributing part from these neighbors and classifies the input document in term of the disturbance degree which it brings to the kernel densities of these selected neighbors. The experimental results indicate that our algorithm LAKNNC is not sensitive to the parameter k and achieves significant classification performance improvement on imbalanced corpora.

关键词： K-nearest neighbor kernel density estimation sumof-squared-error criterion text classification

作者: Xiao-gao Yu Xiao-peng Yu

作者单位: Department of Information Management Hubei University of Economics Wuhan, China Department of Economic Management Wuhan Institute of Technology Wuhan, China

会议类型: 国际会议

会议名称: 第三届IEEE无线通讯、网络技术暨移动计算国际会议

会议地点: 上海

会议语种:英文

在线出版日期: 2007-09-21（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Locally Adaptive Text Classification based k-nearest Neighbors