A Semi-Supervised Text Clustering Algorithm with Word Distribution Weights
Semi-supervised text clustering, as a research branch of the text clustering, aims at employing limited priori knowledge to aid unsupervised text clustering process, and helping users get improved clustering results.Because labeled data are difficult, expensive and time-consuming to obtain, it is important to use the supervised information effectively to improve the performance of clustering significantly.This paper proposes a semi-supervised LDA text clustering algorithm based on the weights of word distribution (WWDLDA).By introducing the coefficients of word distribution obtained from labeled data, LDA model can be used in the field of semi-supervised clustering.In the process of clustering, coefficients always adjust the word distribution to change the clustering results.Our experimental results on real data sets show that the proposed semi-supervised text clustering algorithm can get better clustering results than constrained mixmnl, where mixmnl stands for multinomial model-based EM algorithm.
Text Clustering Semi-supervised Clustering LDA Word Distribution
Ping Zhou Jiayin Wei Yongbin Qin
College of Computer Science and Information,Guizhou University,Guiyang,550000,China
国际会议
三亚
英文
1024-1028
2013-06-21(万方平台首次上网日期,不代表论文的发表时间)