USING CLUSTERING TO ENHANCE TEXT CLASSIFICATION
Enlarging the training set is a general method to get more precise classification results. However, in traditional approach, the training sets are collected manually, so it is always difficult for us to get a training set large enough to enhance the performance of classification since we cannot afford the compensation of human resources. To address this problem, in this paper, we propose a model to get training sets automatically. This model associate clustering by similarity based on LSA with classification algorithm, experimental result shows that classification performance benefit can be gained from this approach and further performance benefits can also be obtained according to further work, which needs more research about feature selection, clustering, classification and semantic similarity calculating algorithm.
Latent Semantic Analysis (LSA) Semantic Clustering Tezt classification clustering
Lei Liu Lei Li Yixin Zhong
Center for Intelligence Science and Technology Research, Beijing University of Posts and Telecommunications, Beijing 100876, China
国际会议
北京
英文
1-4
2008-09-26(万方平台首次上网日期,不代表论文的发表时间)