A FAST DENSITY-BASED CLUSTERING ALGORITHM FOR LARGE DATABASES

摘要：

DBSCAN is a typical clustering algorithm, which can discover clusters with any arbitrary shape and handle noise well. However, it is also slow in comparison due to neighborhood query for each object and faces difficulty in setting density threshold properly. In this paper, a fast density-based clustering algorithm is presented based on DBSCAN. After sorting objects by a certain dimensional coordinates, the new algorithm selects orderly unlabelled points outside a core objects neighborhood as seeds to expand clusters so that the execution frequency of region queries can be decreased. Objects are transformed with a kernel function to improve the clustering accuracy, which diminishes the dependency of density threshold to some extent. Theoretic analysis indicates that the time complexity of this algorithm is approximately linear. Experiments show that the efficiency and the quality for clusters of the proposed algorithm are remarkably superior to those of DBSCAN.

关键词： Clustering DBSCAN Kernel transformation

作者: BING LIU

作者单位: China Securities, Beijing 100031, China

会议类型: 国际会议

会议名称: 2006 International Conference on Machine Learning and Cybernetics(IEEE第五届机器学习与控制论坛)

会议地点: 大连

会议语种:英文

页码: 996-1000

在线出版日期: 2006-08-13（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A FAST DENSITY-BASED CLUSTERING ALGORITHM FOR LARGE DATABASES