会议专题

A Query Translation Disambiguation Method for Cross-Language Information Retrieval Based on Hybrid Corpora

Disambiguation between multiple translation choices is very important in dictionary-based cross-language information retrieval. A hybrid corpora-based method for translation disambiguation is proposed in the paper. Provided an English-Chinese bilingual dictionary in scientific and technological fields, parallel corpus was employed to append translation probability value to bilingual dictionary. Moreover, parallel corpus could contribute to solve the problem of OOV words with the help of toolkits (such as GIZA++). Estimating the CLIR system in experiment corpora is another important question. To establish test collections in small-scale document set, the user queries coverage rate and hit rate about document titles are proposed and its shown that the selection of the two parameters could improve detective rate of information retrieval systems greatly. At present, the test collection constructed has been used in testing CLIR systems successfully, and the experimental results demonstrate the methods of query translation and translation disambiguation for CLIR in the paper is feasible.

translation disambiguation translation probability parallel corpus test collections

Yingfan GAO Yanqing HE Hongjiao XU Huilin WANG

Institute of Scientific and Technical Information of China, Beijing, China, 100038

国际会议

International Council for Scientific and Technical Information Annual Conference(国际科技信息委员会2011年夏季年会 ICSTI 2011)

北京

英文

202-205

2011-06-07(万方平台首次上网日期,不代表论文的发表时间)