Analyzing Data Distribution for Dynamic Data Sets

摘要：

　　In this paper,we discuss the data distribution of data sets that change constantly.In our previous work 1,we analyze the change of the distribution in multi-dimensional data space,and propose an approach to processing the multi-dimensional data sets.Similarity search problems define the distances between data points and a given query point Q,efficiently and effectively selecting data points which are closest to Q.Clusters are subgroups of data points from a data set that are similar to each other within the same subgroup.In 1,we propose an approach to reconstruct clusters based on K nearest neighbor search results for dynamic data sets.However,in high dimensional spaces,for a given cluster,not all dimensions may be relevant to it,and natural clusters might not exist in the full data space.In this paper we extend our work in subspace area,and design an algorithm to detect the subclusters that are readjusted continuously when the data set changes and new query requests come.The reconstructed subclusters can help improve the performance of the future K nearest search process.

作者: YONG SHI SUNPIL KIM

作者单位: Department of Computer Science,Kennesaw State University,1000 Chastain Rd NW,Kennesaw,GA,USA 30144

会议类型: 国际会议

会议名称: 2014 International Conference on Management and Engineering(CME 2014)(2014管理与工程国际会议)

会议地点: 上海

会议语种:英文

页码: 1-7

在线出版日期: 2014-05-24（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Analyzing Data Distribution for Dynamic Data Sets