会议专题

Feature Expansion for Microblogging Text based on Latent Dirichlet Allocation with User Feature

Traditional TDT (Topic Detection and Tracking, TDT) is based on large scale of news stream. However, with the development of new technology, Microblogging platform has become a new generation of platform for information distribution and communication. As many features which are totally different from the common news report exist in Microblogging text, old methods for TDT become ineffective. We present a new framework based on U-LDA (Latent Dirichlet Allocation with User Feature, U-LDA) which considers the user features on the Microblogging platform. We expand the feature of short text on the Microblogging platform by using U-LDA Model, which improves the precision of TDT tasks. In this paper, we discuss and summarize the particular features of Microblogging text, and present a method which considers user features in LDA model, thus we propose a general TDT framework based on U-LDA model. By applying the new model on a Microblogging corpus, we conclude that U-LDA is more effective than LDA.

TDT LDA model user features short text

Wei Xia Yanxiang He Ye Tian Qiang Chen Lu Lin

School of Computer in Wuhan University Wuhan, China

国际会议

2011 6th Joint International Information Technology and Artificial Intelligence Conference(2011年第六届IEEE联合国际信息技术与人工智能会议 IEEE ITAIC 2011)

重庆

英文

228-232

2011-08-20(万方平台首次上网日期,不代表论文的发表时间)