A Cluster Description Method for High Dimensional Data Clustering with Categorical Variables
High dimensional data clustering is always of great difficulty in clustering research. Before the clustering process is accomplished, the partition of the objects is unknown.Therefore after the clustering process, the results of the final clusters should be presented understandably, which will be strictly difficult when it comes to high dimensionality. This paper presents a cluster description schema for high dimensional data clustering with categorical variables. The description schema presented in this paper uses supremum and infimum to represent the clusters concisely and based on the schema a new method is given to assign the nonsample objects to clusters obtained from sample space. The distribution process requires one-time scan of dataset, updates the description of clusters dynamically, and can detect the isolated objects. Experiments on both synthetic and real data show its effectiveness and scalability.
Clustering Data Mining Categorical Variables High Dimensional Space KDD
Sen Wu Shujuan Gu
School of Economics and Management University of Science and Technology Beijing Beijing, 100083, China
国际会议
长沙
英文
32-35
2010-03-13(万方平台首次上网日期,不代表论文的发表时间)