Study on Tibetan-Chinese Comparable Corpus Extraction
Tibetan-Chinese comparable corpus extraction is a basis work for Tibetan-Chinese cross language question answering system,information retrieval,machine translation and other researches.This paper is an exploration to solve the scarcity of Tibetan-Chinese comparable corpus.It will promote the knowledge sharing between different languages.In this paper,we propose a method to extract Tibetan-Chinese comparable corpus.The main work is in the following:(1)Tibetan-Chinese comparable corpus extraction model based on multi-feature of bilingual websites(2)Extraction method based on entity link from naturally annotated resources.Finally,the experimental results show our approach is effective.
Tibetan-Chinese comparable corpus multi-feature fusion algorithm
Sun Yuan
School of Information Engineering,Minzu University of China;Minority Languages Branch,National Language Resource and Monitoring Research Center 100081,Beijing,China
国内会议
福州
英文
1-5
2015-12-11(万方平台首次上网日期,不代表论文的发表时间)