会议专题

PPLSA:Parallel Probabilistic Latent Semantic Analysis Based on MapReduce

  PLSA(Probabilistic Latent Semantic Analysis) is a popular topic modeling technique for exploring document collections.Due to the increasing prevalence of large datasets,them is a need to improve the scalability of computation in PLSA.In this paper,we propose a parallel PLSA algorithm called PPLSA to accommodate large corpus collections in the MapReduce framework.Our solution efficiently distributes computation and is relatively simple to implement.

Probabilistic Latent Semantic Analysis MapReduce EM Parallel

Ning Li Fuzhen Zhuang Qing He Zhongzhi Shi

The Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese A The Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese A

国际会议

7th IFIP TC 12 International Conference (第七届智能信息处理国际会议 (IIP 2012))

桂林

英文

40-49

2012-10-12(万方平台首次上网日期,不代表论文的发表时间)