会议专题

A Weighted Topical Document Embedding based Clustering Method for News Text

  As an unsupervised machine learning method,clustering can preliminarily group text without artificial labeling,which effectively accelerates the organization,abstraction and navigation on large news set.The length of news is long,and the text contains many homonymy and polysemy,that is one of the reason that traditional text clustering methods perform weaker on grouping news text.This paper presents a novel text representation method based on topical document embedding (TDE) to capture the semantic features of different topics.In TDE representation,document embedding of news texts is obtained by adding up word vector from Skip-Gram model weighted by TFIDF score of all the key words in the text.While the topical document embedding is learned by joining the topic vectors obtained from LDA model and the document vectors in document embedding.By using topical document embedding to perform clustering,we implement a novel text clustering method (TDE-TC).The experimental results show that the effect of news clustering based on TDE representation is better than that of bag of words model and LDA model.

Text Clustering Skip-Gram LDA TF-IDF

Zhu Dechao Song Hui

School of Computer Science Donghua University Shanghai,China

国际会议

2016IEEE第二届信息技术、网络、电子及自动化控制会议

重庆

英文

1060-1065

2016-03-20(万方平台首次上网日期,不代表论文的发表时间)