USING CLUSTERING TO ENHANCE TEXT CLASSIFICATION

摘要：

Enlarging the training set is a general method to get more precise classification results. However, in traditional approach, the training sets are collected manually, so it is always difficult for us to get a training set large enough to enhance the performance of classification since we cannot afford the compensation of human resources. To address this problem, in this paper, we propose a model to get training sets automatically. This model associate clustering by similarity based on LSA with classification algorithm, experimental result shows that classification performance benefit can be gained from this approach and further performance benefits can also be obtained according to further work, which needs more research about feature selection, clustering, classification and semantic similarity calculating algorithm.

关键词： Latent Semantic Analysis (LSA) Semantic Clustering Tezt classification clustering

作者: Lei Liu Lei Li Yixin Zhong

作者单位: Center for Intelligence Science and Technology Research, Beijing University of Posts and Telecommunications, Beijing 100876, China

会议类型: 国际会议

会议名称: China-Ireland International Conference on Information and Communications Technologies 2008(2008 中国-爱尔兰信息与通信技术国际会议 CIICT 2008)

会议地点: 北京

会议语种:英文

页码: 1-4

在线出版日期: 2008-09-26（万方平台首次上网日期，不代表论文的发表时间）

会议专题

USING CLUSTERING TO ENHANCE TEXT CLASSIFICATION