An Improved KNN Based Outlier Detection Algorithm for Large Datasets
Outlier detection is becoming a hot issue in the field of data mining since outliers often contain useful information. In this paper, we propose an improved KNN based outlier detection algorithm which is fulfilled through two stage clustering. Clustering one is to partition the dataset into several clusters and then calculate the Kth nearest neighbor in each cluster which can effectively avoid passing the entire dataset for each calculation. Clustering two is to partition the clusters obtained by clustering one and then prune the partitions as soon as it is determined that it cannot contain outliers which results in substantial savings in computation. Experimental results on both synthetic and real life datasets demonstrate that our algorithm is efficient in large datasets.
Data mining Knn Outlier Detection
Qian Wang Min Zheng
School of Computer Science Chongqing University Chongqing China
国际会议
6th International Conference on Advanced Data Mining and Applications(第六届先进数据挖掘及应用国际会议 ADMA 2010)
重庆
英文
585-592
2010-11-19(万方平台首次上网日期,不代表论文的发表时间)