Eztracting Historical Terms Based on Aligned Chinese-English Parallel Corpora

摘要：

This paper examines the feasibility of implementing statistic-oriented term extraction and evaluation methods in extracting historical terms from aligned parallel corpora of Chinese historical classics and their translations. It proposes to take transliteration as anchor points to establish sentence-level alignment. It also investigates the approach to extract term translation pairs based on 4000 parallel sentences or segments of sentences from the corpora of the Chinese historical classic Shi Ji (Records of the Historian) and its English translations by two well-known translators. The experimental results indicate that the statistically sound algorithm can successfully extract those terms whose English translations are consistent throughout the corpus and those transliterated pairs, but fails to extract the translations of those terms that are translated differently by the two translators although the translations may be equally qualified in terms of their usage in the English language. The algorithm also fails to extract the top frequency terms which are ambiguous in meaning due to changes of its part of speech. Therefore, this paper suggests insights gained from the linguistic and translation studies perspectives can be integrated with the statistic measurements to improve the extraction and validating results.

关键词： Historical term eztraction parallel corpora Chinese historical classics

作者: Xiuying LI Chao CHE Limin HAN Xiaoxia LIU

作者单位: Dalian University of Technology Dalian, Liaoning, China Dalian University of Technology. Dalian, Liaoning, China

会议类型: 国际会议

会议名称: International Conference on Natural Language Processing and Knowledge Engineering(IEEE自然语言处理与知识工程国际会议 IEEE NLP-KE 2009)

会议地点: 大连

会议语种:英文

页码: 1-6

在线出版日期: 2009-09-24（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Eztracting Historical Terms Based on Aligned Chinese-English Parallel Corpora