Semi-supervised clustering algorithm based on small size of labeled data

摘要：

　　In many data mining domains,labeled data is very expensive to generate,how to make the best use of labeled data to guide the process of unlabeled clustering is the core problem of semi-supervised clustering.Most of semi-supervised clustering algorithms require a certain amount of labeled data and need set the values of some parameters,different values maybe have different results.In view of this,a new algorithm,called semi-supervised clustering algorithm based on small size of labeled data,is presented,which can use the small size of labeled data to expand labeled dataset by labeling their k-nearest neighbors and only one parameter.We demonstrate our clustering algorithm with three UCI datasets,compared with SSDBSCAN4 and KNN,the experimental results confirm that accuracy of our clustering algorithm is close to that of KNN classification algorithm.

关键词： Data Mining semi-supervised clustering label propagation

作者: Mingwei Leng Xiaoyun Chen Jianjun Cheng Longjie Li

作者单位: School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China;School of School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China

会议类型: 国际会议

会议名称: the Second International Conference on Frontiers of Manufacturing and Design Science(第二届制造与设计科学国际会议(ICFMD 2011))

会议地点: 台湾

会议语种:英文

页码: 4675-4679

在线出版日期: 2011-12-11（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Semi-supervised clustering algorithm based on small size of labeled data