A LINEAR DBSCAN ALGORITHM BASED ON LSH

摘要：

DBSCAN algorithm is used widely because it can effectively handle noise points and deal with data of any type in clustering.However, it has two inherent limitations: high time complexity O(NlogN) and poor ability in dealing large-scale data.In this paper, a linear DBSCAN based on LSH is proposed.In our algorithm the process of Nearest Neighbor Search is optimized by hashing.Compared with the original DBSCAN algorithm, the time complexity of this improved DBSCAN descends to O(N).Experimentally, this improved DBSCAN makes a significant decrease in the running time while maintaining the Cluster quality of the results.Moreover, the speedup (the running time of original DBSCAN algorithm divided by the running time of improved algorithm) increases with the size and dimension of dataset, and the parameter Eps of our algorithm does not have a strong influence on the clustering result.These improved properties enable DBSCAN to be used in a large scope.

关键词： LSH DBSCAN Clustering Unsupervised learning Large-scale data

作者: YI-PU WU JIN-JIANG GUO XUE-JIE ZHANG

作者单位: Department of Computer Science and Engineering, Yunnan University, Kunming 650091, China;Department Department of Computer Science and Engineering, Yunnan University, Kunming 650091, China

会议类型: 国际会议

会议名称: 2007 International Conference on Machine Learning and Cybernetics(IEEE第六届机器学习与控制论国际会议)

会议地点: 香港

会议语种:英文

页码: 2608-2614

在线出版日期: 2007-08-19（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A LINEAR DBSCAN ALGORITHM BASED ON LSH