A Refinement Approach to Handling Model Misfit in Semi-supervised Learning

摘要：

Semi-supervised learning has been the focus of machine learning and data mining research in the past few years. Various algorithms and techniques have been proposed, from generative models to graphbased algorithms. In this work, we focus on the Cluster-and-Label approaches for semi-supervised classification. Existing cluster-and-label algorithms are based on some underlying models and/or assumptions. When the data fits the model well, the classification accuracy will be high. Otherwise, the accuracy will be low. In this paper, we propose a refinement approach to address the model misfit problem in semi-supervised classification. We show that we do not need to change the cluster-and-label technique itself to make it more flexible. Instead, we propose to use successive refinement clustering of the dataset to correct the model misfit. A series of experiments on UCI benchmarking data sets have shown that the proposed approach outperforms existing cluster-and-label algorithms, as well as traditional semi-supervised classification techniques including Self-training and Tri-training.

关键词： Semi-supervised learning model misfit classification

作者: Hanjing Su Ling Chen Yunniing Ye Zhaocai Sun Qingyao Wu

作者单位: Department of Computer Science, Shenzhen Graduate School,Harbin Institute of Technology, Shenzhen 51 QCIS, Faculty of Engineering and Information Technology,University of Technology, Sydney 2007, Austr

会议类型: 国际会议

会议名称: 6th International Conference on Advanced Data Mining and Applications(第六届先进数据挖掘及应用国际会议 ADMA 2010)

会议地点: 重庆

会议语种:英文

页码: 75-86

在线出版日期: 2010-11-19（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A Refinement Approach to Handling Model Misfit in Semi-supervised Learning