Clustering Algorithm Based On Sparse Feature Vector for Interval-Scaled Variables

摘要：

A two-step algorithm, Clustering Algorithm Based On Sparse Feature Vector for Interval-Scaled Variables (CABOSFV_I), is proposed for high dimensional sparse data clustering in this paper. It decomposes a high dimensional problem into several low dimensional ones in first step and then gets the final clusters by second clustering. Because the irrelevant attributes are removed from each cluster in first step, it diminishes the dimensions effectively. Furthermore, the algorithm compresses data effectively by using Sparse Feature Vector. Data scale is reduced enormously, but clustering quality is not affected. Because of the effective dimension deduction and data compression, the algorithm finds clusters in high dimensional large datasets effectively and efficiently.

关键词： Clustering Data Mining Sparse Data High Dimensional Space

作者: Sen Wu Guiying Wei Shujuan Gu Xiaofang Ma

作者单位: School of Economics and Management University of Science and Technology of Beijing Beijing, P.R.China

会议类型: 国际会议

会议名称: 第三届IEEE无线通讯、网络技术暨移动计算国际会议

会议地点: 上海

会议语种:英文

在线出版日期: 2007-09-21（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Clustering Algorithm Based On Sparse Feature Vector for Interval-Scaled Variables