An Improved KNN Based Outlier Detection Algorithm for Large Datasets

摘要：

Outlier detection is becoming a hot issue in the field of data mining since outliers often contain useful information. In this paper, we propose an improved KNN based outlier detection algorithm which is fulfilled through two stage clustering. Clustering one is to partition the dataset into several clusters and then calculate the Kth nearest neighbor in each cluster which can effectively avoid passing the entire dataset for each calculation. Clustering two is to partition the clusters obtained by clustering one and then prune the partitions as soon as it is determined that it cannot contain outliers which results in substantial savings in computation. Experimental results on both synthetic and real life datasets demonstrate that our algorithm is efficient in large datasets.

关键词： Data mining Knn Outlier Detection

作者: Qian Wang Min Zheng

作者单位: School of Computer Science Chongqing University Chongqing China

会议类型: 国际会议

会议名称: 6th International Conference on Advanced Data Mining and Applications(第六届先进数据挖掘及应用国际会议 ADMA 2010)

会议地点: 重庆

会议语种:英文

页码: 585-592

在线出版日期: 2010-11-19（万方平台首次上网日期，不代表论文的发表时间）

会议专题

An Improved KNN Based Outlier Detection Algorithm for Large Datasets