Study on Tibetan-Chinese Comparable Corpus Extraction

摘要：

　　Tibetan-Chinese comparable corpus extraction is a basis work for Tibetan-Chinese cross language question answering system,information retrieval,machine translation and other researches.This paper is an exploration to solve the scarcity of Tibetan-Chinese comparable corpus.It will promote the knowledge sharing between different languages.In this paper,we propose a method to extract Tibetan-Chinese comparable corpus.The main work is in the following:(1)Tibetan-Chinese comparable corpus extraction model based on multi-feature of bilingual websites(2)Extraction method based on entity link from naturally annotated resources.Finally,the experimental results show our approach is effective.

关键词： Tibetan-Chinese comparable corpus multi-feature fusion algorithm

作者: Sun Yuan

作者单位: School of Information Engineering,Minzu University of China;Minority Languages Branch,National Language Resource and Monitoring Research Center 100081,Beijing,China

会议类型: 国内会议

会议名称: 第七届社会计算会议

会议地点: 福州

会议语种:英文

页码: 1-5

在线出版日期: 2015-12-11（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Study on Tibetan-Chinese Comparable Corpus Extraction