会议专题

A Cluster Description Method for High Dimensional Data Clustering with Categorical Variables

High dimensional data clustering is always of great difficulty in clustering research. Before the clustering process is accomplished, the partition of the objects is unknown.Therefore after the clustering process, the results of the final clusters should be presented understandably, which will be strictly difficult when it comes to high dimensionality. This paper presents a cluster description schema for high dimensional data clustering with categorical variables. The description schema presented in this paper uses supremum and infimum to represent the clusters concisely and based on the schema a new method is given to assign the nonsample objects to clusters obtained from sample space. The distribution process requires one-time scan of dataset, updates the description of clusters dynamically, and can detect the isolated objects. Experiments on both synthetic and real data show its effectiveness and scalability.

Clustering Data Mining Categorical Variables High Dimensional Space KDD

Sen Wu Shujuan Gu

School of Economics and Management University of Science and Technology Beijing Beijing, 100083, China

国际会议

2010 International Conference on Measuring Technology and Mechatronics Automation(ICMTMA 2010)(2010年检测技术与机电自动化国际会议)

长沙

英文

32-35

2010-03-13(万方平台首次上网日期,不代表论文的发表时间)