ACTIVE LEARNING USING LOCALIZED GENERALIZATION ERROR FOR TEXT CATEGORIZATION

摘要：

Text categorization is one of the important steps of many applications, e.g. webpage classification, indexing in search engine and information retrieval. When the number of documents available is huge, active learning could help relief the training time and cost. Moreover, active learning is able to filter out noisy samples for training and therefore may achieve better generalization capability. In this work, we adopt the localized generalization error model to active learning for text categorization. In our approach, the samples yielding the highest generalization error for those unseen samples local to it is selected as the next training sample. The feature extraction from raw documents is also discussed.Experimental results show that the proposed method is effective in reducing the number of training samples and achieves good generalization capability.

关键词： Text Categorization Active Learning Localized Generalization Error Bound

作者: DANIEL S.YEUNG YING ZHANG WING W.Y.NG QING-CAI CHEN

作者单位: Media and Life Science Computing Laboratory, Shenzhen Graduate School, Harbin Institute of Technolog Media and Life Science Computing Laboratory, Shenzhen Graduate School, Harbin Institute of Technolog

会议类型: 国际会议

会议名称: 2006 International Conference on Machine Learning and Cybernetics(IEEE第五届机器学习与控制论坛)

会议地点: 大连

会议语种:英文

页码: 2686-2691

在线出版日期: 2006-08-13（万方平台首次上网日期，不代表论文的发表时间）

会议专题

ACTIVE LEARNING USING LOCALIZED GENERALIZATION ERROR FOR TEXT CATEGORIZATION