Combining Clustering Coefficient-based Active Learning and Semi-Supervised Learning on Networked Data
Active learning and semi-supervised learning are both important techniques to improve the learned model using unlabeled data, when labeled data is difficult to obtain, and unlabeled data is available in large quantity and easy to collect. Combining active learning with a semisupervised learning algorithm that uses Gaussian field and harmonic functions was suggested recently. This work showed that empirical risk minimization (ERM) could find the next instance to label effectively, but the computation time consumption with ERM was large. In the case where the data is graphical in nature, we can leverage the graph topological analysis to rapidly select instances that are likely to be good candidates for labeling. This paper describes a novel approach of using clustering coefficient metric to identify the best instance next to label. We experiment on the 20 newsgroups dataset with three binary classification tasks, the results show that clustering coefficient strategy has similar performance to ERM with less time consumption.
Xiaoqi He Yangguang Liu Bin Xu Xiaogang Jin
Ningbo Institute of Technology, Zhejiang University Ningbo, Zhejiang, 315000, PR China
国际会议
The 2010 International Conference on Intelligent Systems and Knowledge Engineering(第五届智能系统与知识工程国际会议)
杭州
英文
305-309
2010-11-15(万方平台首次上网日期,不代表论文的发表时间)