Feature Expansion for Microblogging Text based on Latent Dirichlet Allocation with User Feature
Traditional TDT (Topic Detection and Tracking, TDT) is based on large scale of news stream. However, with the development of new technology, Microblogging platform has become a new generation of platform for information distribution and communication. As many features which are totally different from the common news report exist in Microblogging text, old methods for TDT become ineffective. We present a new framework based on U-LDA (Latent Dirichlet Allocation with User Feature, U-LDA) which considers the user features on the Microblogging platform. We expand the feature of short text on the Microblogging platform by using U-LDA Model, which improves the precision of TDT tasks. In this paper, we discuss and summarize the particular features of Microblogging text, and present a method which considers user features in LDA model, thus we propose a general TDT framework based on U-LDA model. By applying the new model on a Microblogging corpus, we conclude that U-LDA is more effective than LDA.
TDT LDA model user features short text
Wei Xia Yanxiang He Ye Tian Qiang Chen Lu Lin
School of Computer in Wuhan University Wuhan, China
国际会议
重庆
英文
228-232
2011-08-20(万方平台首次上网日期,不代表论文的发表时间)