会议专题

Memory Effect in DBSCAN Algorithm

As a density-based clustering algorithm, DBSCAN plays an important role in data mining. Normally DBSCAN algorithm is computationally expensive, limiting its performance in large scale data sets, especially in high dimensional data sets. The high complexity is rooted from the region queries, a very common operation in density-based algorithms, which brings the complexity of the algorithms to O(n2), where n is the number of database objects. With the help of index structure the complexity can be reduced to O (nlogn), however it is inefficient to create the index structureespecially for high dimensional data sets or large scale databases. In this paper we propose a new concept named memory effect (ME). ME can be used to shrink the scope of region queries to neighboring objects. Based on ME we have improved DBSCAN algorithm evidently, and empirical experiments have shown the improvement in both effectiveness and efficiency. At last, we give the theoretical analysis of MEDBSCAN algorithm and talk about the influence of parameters.

Density-based clustering DBSCAN Algorithm MEDBSCAN Algorithm Memory Effect

LI Jian YU Wei YAN Bao-Ping

Computer Network Information Center Chinese Academy of Sciences, Beijing, China Software College, Bei Hang University, Beijing, China

国际会议

第四届国际计算机新科技与教育学术会议(2009 4th International Conference on Computer Science & Education)

南京

英文

31-36

2009-07-25(万方平台首次上网日期,不代表论文的发表时间)