A Query Translation Disambiguation Method for Cross-Language Information Retrieval Based on Hybrid Corpora
Disambiguation between multiple translation choices is very important in dictionary-based cross-language information retrieval. A hybrid corpora-based method for translation disambiguation is proposed in the paper. Provided an English-Chinese bilingual dictionary in scientific and technological fields, parallel corpus was employed to append translation probability value to bilingual dictionary. Moreover, parallel corpus could contribute to solve the problem of OOV words with the help of toolkits (such as GIZA++). Estimating the CLIR system in experiment corpora is another important question. To establish test collections in small-scale document set, the user queries coverage rate and hit rate about document titles are proposed and its shown that the selection of the two parameters could improve detective rate of information retrieval systems greatly. At present, the test collection constructed has been used in testing CLIR systems successfully, and the experimental results demonstrate the methods of query translation and translation disambiguation for CLIR in the paper is feasible.
translation disambiguation translation probability parallel corpus test collections
Yingfan GAO Yanqing HE Hongjiao XU Huilin WANG
Institute of Scientific and Technical Information of China, Beijing, China, 100038
国际会议
北京
英文
202-205
2011-06-07(万方平台首次上网日期,不代表论文的发表时间)