会议专题

A Fast Subspace Partition Clustering Algorithm for High Dimensional Data Streams

Data stream clustering is an important research problem in data stream mining. However, clustering arbitrary shapes over high dimensional data streams has not been well addressed. In this paper, we propose a fast suhspace partition data streams clustering algorithm, which adopts two-phased clustering framework. In the online component, the extension of adjacent unit (E-unit), which has common edge or vertex with dense units, is presented. Moreover, the improved CD-Tree lattice structure is introduced to store the information of non-empty units, maintain the position relationships among units, and keep the affiliation between dense units (D-units) and E-units. Outdated units which need to be faded are performed by decayed function, so that the corresponding micro-clusters are maintained dynamically. In the offline component, the final clusters are generated according to all the microclusters by searching D-units in radius range. Experimental results show that SPDStream has higher clustering quality than CluStream which can not generate clusters of arbitrary shapes. Furthermore, our approach has better scalability with different dimensionality and different partition granularity.

data mining data streams clustering subspace partition CD-Tree lattice structure

Zhongping Zhang Hao Wang

College of Information Science and Engineering,Yanshan University,Qinhuangdao City,P.R.China

国际会议

2009 IEEE International Conference on Intelligent Computing and Intelligent Systems(2009 IEEE 智能计算与智能系统国际会议)

上海

英文

491-495

2009-11-20(万方平台首次上网日期,不代表论文的发表时间)