Cross Language Information Retrieval Based On LDA
This paper proposed a LDA-based cross-language retrieval model that did not rely on word-by-word translation of query or document. Instead, a parallel corpus was used to estimate a cross-language LDA (Latent Dirichlet Allocation) model. We assumed that a topic variable Z in LDA could generate both an English token and a Chinese token, given that the parallel corpus contained two languages: English and Chinese. Therefore, the LDA model was easy to be extended to multi-language information retrieval as long as a multi-lingual parallel corpus was provided. The proposed LDA-based cross-language retrieval model was compared with three popular retrieval models: LDA-based mono-lingual document model; Mono-lingual TF.IDF retrieval model; Cross-lingual Latent Semantic Indexing retrieval model on CNKI datasets. Experimental results showed that this model was very effective and achieved very good performance.
LDA topic model cross language information retrieval
Ai Wang YaoDong Li Wei Wang
Key Laboratory of Complex System and Intelligence Science,Institute of Automation,Chinese Academy of Sciences
国际会议
上海
英文
2300-2305
2009-11-20(万方平台首次上网日期,不代表论文的发表时间)