Self-adaptive Change Detection in Streaming Data with Non-stationary Distribution
Non-stationary distribution, in which the data distribution evolves over time, is a common issue in many application fields, e.g., intrusion detection and grid computing. Detecting the changes in massive streaming data with a non-stationary distribution helps to alarm the anomalies, to clean the noises, and to report the new patterns. In this paper, we employ a novel approach for detecting changes in streaming data with the purpose of improving the quality of modeling the data streams. Through observing the outliers, this approach of change detection uses a weighted standard deviation to monitor the evolution of the distribution of data streams. A cumulative statistical test, Page-Hinkley, is employed to collect the evidence of changes in distribution. The parameter used for reporting the changes is self-adaptively adjusted according to the distribution of data streams, rather than set by a fixed empirical value. The self-adaptability of the novel approach enhances the effectiveness of modeling data streams by timely catching the changes of distributions. We validated the approach on an online clustering framework with a benchmark KDDcup 1999 intrusion detection data set as well as with a realworld grid data set. The validation results demonstrate its better performance on achieving higher accuracy and lower percentage of outliers comparing to the other change detection approaches.
Change detection Data stream Self-adaptive parameter setting Non-stationary distribution
Xiangliang Zhang Wei Wang
Mathematical and Computer Sciences and Engineering Division King Abdullah University of Science and Interdisciplinary Centre for Security Reliability and Trust (SnT Centre),University of Luxembourg Lu
国际会议
6th International Conference on Advanced Data Mining and Applications(第六届先进数据挖掘及应用国际会议 ADMA 2010)
重庆
英文
334-345
2010-11-19(万方平台首次上网日期,不代表论文的发表时间)