A Semi-Supervised Text Clustering Algorithm with Word Distribution Weights

摘要：

　　Semi-supervised text clustering, as a research branch of the text clustering, aims at employing limited priori knowledge to aid unsupervised text clustering process, and helping users get improved clustering results.Because labeled data are difficult, expensive and time-consuming to obtain, it is important to use the supervised information effectively to improve the performance of clustering significantly.This paper proposes a semi-supervised LDA text clustering algorithm based on the weights of word distribution (WWDLDA).By introducing the coefficients of word distribution obtained from labeled data, LDA model can be used in the field of semi-supervised clustering.In the process of clustering, coefficients always adjust the word distribution to change the clustering results.Our experimental results on real data sets show that the proposed semi-supervised text clustering algorithm can get better clustering results than constrained mixmnl, where mixmnl stands for multinomial model-based EM algorithm.

关键词： Text Clustering Semi-supervised Clustering LDA Word Distribution

作者: Ping Zhou Jiayin Wei Yongbin Qin

作者单位: College of Computer Science and Information,Guizhou University,Guiyang,550000,China

会议类型: 国际会议

会议名称: 2013 International Conference on Education Technology and Information Systems(ICETIS2013)2013教育技术与信息系统国际会议

会议地点: 三亚

会议语种:英文

页码: 1024-1028

在线出版日期: 2013-06-21（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A Semi-Supervised Text Clustering Algorithm with Word Distribution Weights