会议专题

A Three-Step Clustering Algorithm over an Evolving Data Stream

Distinguishing potential new cluster data from outliers is a main problem in mining new pattern from evolving data streams. Meanwhile, all the clustering algorithms inherited from CluStream framework are distribution-based learning which are realized via a sliding window, so this problem becomes more obvious. This paper proposes a three-step clustering algorithm, rDenStream, based on DenStream, which includes outlier retrospect learning. During rDenStream clustering, dropped micro-clusters are stored on outside memory temporarily, and when a new cluster is discovered, these micro-clusters are learned retrospectively to find formally inaccurately-discarded data, which will improve the accuracy of the new cluster..rDenStream has important meaning in applications which require highaccuracy clustering from evolving data. Considering the data stream feature in NIDS, this paper models the arriving time of new pattern data as nonhomogeneous Poisson distribution. Experiments over standard data set show its advantage over other methods in the early phase of new pattern discovery.

Data mining Evolving data streams Clustering retrospect learning non-honwgeneous Poisson process

LIU Li-xiong KANG Jing GUO Yun-fei HUANG Hai

National Digital Switching System Engineering& Technological Research Center Zhengzhou,China National Digital Switching System Engineering &Technological Research CenterZhengzhou,China National Digital Switching System Engineering & Technological Research Center Zhengzhou,China

国际会议

2009 IEEE International Conference on Intelligent Computing and Intelligent Systems(2009 IEEE 智能计算与智能系统国际会议)

上海

英文

160-164

2009-11-20(万方平台首次上网日期,不代表论文的发表时间)