会议专题

Combining Clustering Coefficient-based Active Learning and Semi-Supervised Learning on Networked Data

Active learning and semi-supervised learning are both important techniques to improve the learned model using unlabeled data, when labeled data is difficult to obtain, and unlabeled data is available in large quantity and easy to collect. Combining active learning with a semisupervised learning algorithm that uses Gaussian field and harmonic functions was suggested recently. This work showed that empirical risk minimization (ERM) could find the next instance to label effectively, but the computation time consumption with ERM was large. In the case where the data is graphical in nature, we can leverage the graph topological analysis to rapidly select instances that are likely to be good candidates for labeling. This paper describes a novel approach of using clustering coefficient metric to identify the best instance next to label. We experiment on the 20 newsgroups dataset with three binary classification tasks, the results show that clustering coefficient strategy has similar performance to ERM with less time consumption.

Xiaoqi He Yangguang Liu Bin Xu Xiaogang Jin

Ningbo Institute of Technology, Zhejiang University Ningbo, Zhejiang, 315000, PR China

国际会议

The 2010 International Conference on Intelligent Systems and Knowledge Engineering(第五届智能系统与知识工程国际会议)

杭州

英文

305-309

2010-11-15(万方平台首次上网日期,不代表论文的发表时间)