Short Text Feature Extraction and Clustering for Web Topic Mining

(0)

摘要：

This paper is to introduce an algorithm to cluster Chinese short texts for mining web topics based on Chinese chunks. Aiming at the characteristics of Chinese short texts, the algorithm employs N-gram feature extraction to capture Chinese chunks from texts, which reflect the text semantic structure and character dependency. Then RPCL algorithm is applied to realizing text clustering with high precision, which doesnt need know the exact number of clusters. Finally, the experiment results show that this approach can remarkably reduce the dimensionality and effectively improve the performance of Chinese short texts clustering than traditional methods.

作者: Hui He Bo Chen Weiran Xu Jun Guo

作者单位: School of Information Engineering,Beijing University of Posts and Telecommunications Beijing, P.R. China, 100876

会议类型: 国际会议

会议名称: 2007年第三届语义和知识网格国际会议(Third International Conference on Semantics,Knowledge,and Grid)(SKG 2007)

会议地点: 西安

会议语种:英文

在线出版日期: 2007-10-29（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Short Text Feature Extraction and Clustering for Web Topic Mining