会议专题

Study on Tibetan-Chinese Comparable Corpus Extraction

  Tibetan-Chinese comparable corpus extraction is a basis work for Tibetan-Chinese cross language question answering system,information retrieval,machine translation and other researches.This paper is an exploration to solve the scarcity of Tibetan-Chinese comparable corpus.It will promote the knowledge sharing between different languages.In this paper,we propose a method to extract Tibetan-Chinese comparable corpus.The main work is in the following:(1)Tibetan-Chinese comparable corpus extraction model based on multi-feature of bilingual websites(2)Extraction method based on entity link from naturally annotated resources.Finally,the experimental results show our approach is effective.

Tibetan-Chinese comparable corpus multi-feature fusion algorithm

Sun Yuan

School of Information Engineering,Minzu University of China;Minority Languages Branch,National Language Resource and Monitoring Research Center 100081,Beijing,China

国内会议

第七届社会计算会议

福州

英文

1-5

2015-12-11(万方平台首次上网日期,不代表论文的发表时间)