会议专题

Cross Language Information Retrieval Based On LDA

This paper proposed a LDA-based cross-language retrieval model that did not rely on word-by-word translation of query or document. Instead, a parallel corpus was used to estimate a cross-language LDA (Latent Dirichlet Allocation) model. We assumed that a topic variable Z in LDA could generate both an English token and a Chinese token, given that the parallel corpus contained two languages: English and Chinese. Therefore, the LDA model was easy to be extended to multi-language information retrieval as long as a multi-lingual parallel corpus was provided. The proposed LDA-based cross-language retrieval model was compared with three popular retrieval models: LDA-based mono-lingual document model; Mono-lingual TF.IDF retrieval model; Cross-lingual Latent Semantic Indexing retrieval model on CNKI datasets. Experimental results showed that this model was very effective and achieved very good performance.

LDA topic model cross language information retrieval

Ai Wang YaoDong Li Wei Wang

Key Laboratory of Complex System and Intelligence Science,Institute of Automation,Chinese Academy of Sciences

国际会议

2009 IEEE International Conference on Intelligent Computing and Intelligent Systems(2009 IEEE 智能计算与智能系统国际会议)

上海

英文

2300-2305

2009-11-20(万方平台首次上网日期,不代表论文的发表时间)