Semi-supervised Text Categorization by Considering Sufficiency and Diversity
In text categorization (TC), labeled data is often limited while unla beled data is ample.This motivates semi-supervised learning for TC to improve the performance by exploring the knowledge in both labeled and unlabeled data.In this paper, we propose a novel bootstrapping approach to semi-supervised TC.First of all, we give two basic preferences, i.e., sufficiency and diversity for a possibly successful bootstrapping.After carefully considering the diversity preference, we modify the traditional bootstrapping algorithm by training the involved classifiers with random feature subspaces instead of the whole feature space.Moreover, we further improve the random feature subspace-based boot strapping with some constraints on the subspace generation to better satisfy the diversity preference.Experimental evaluation shows the effectiveness of our modified bootstrapping approach in both topic and sentiment-based TC tasks.
Sentiment Classification Semi-supervised Learning Bootstrapping
Shoushan Li Sophia Yat Mei Lee Wei Gao Chu-Ren Huang
Natural Language Processing Lab, School of Computer Science and Technology, Soochow University, Chin CBS, The Hong Kong Polytechnic University, Hong Kong Natural Language Processing Lab, School of Computer Science and Technology, Soochow University, Chin
国际会议
Second CCF Conference,NLPCC2013(第二届自然语言处理与中文计算会议)
重庆
英文
105-115
2013-11-15(万方平台首次上网日期,不代表论文的发表时间)