A Cluster Description Method for High Dimensional Data Clustering with Categorical Variables

摘要：

High dimensional data clustering is always of great difficulty in clustering research. Before the clustering process is accomplished, the partition of the objects is unknown.Therefore after the clustering process, the results of the final clusters should be presented understandably, which will be strictly difficult when it comes to high dimensionality. This paper presents a cluster description schema for high dimensional data clustering with categorical variables. The description schema presented in this paper uses supremum and infimum to represent the clusters concisely and based on the schema a new method is given to assign the nonsample objects to clusters obtained from sample space. The distribution process requires one-time scan of dataset, updates the description of clusters dynamically, and can detect the isolated objects. Experiments on both synthetic and real data show its effectiveness and scalability.

关键词： Clustering Data Mining Categorical Variables High Dimensional Space KDD

作者: Sen Wu Shujuan Gu

作者单位: School of Economics and Management University of Science and Technology Beijing Beijing, 100083, China

会议类型: 国际会议

会议名称: 2010 International Conference on Measuring Technology and Mechatronics Automation(ICMTMA 2010)(2010年检测技术与机电自动化国际会议)

会议地点: 长沙

会议语种:英文

页码: 32-35

在线出版日期: 2010-03-13（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A Cluster Description Method for High Dimensional Data Clustering with Categorical Variables