Statistical Machine Translation Based on LDA

摘要：

Current Statistical Machine Translation (SMT) systems translate one sentence at a time, ignoring any document level information. Consequently, translation models are learned only at sentence level and document contexts are generally overlooked. In this paper, we try to introduce document topic to help SMT system to produce target sentences. First, the parallel training corpus with underlying document boundary is segmented into multiple documents, and then we use a monolingual LDA model to determine which topics these documents belong to. Next, the background phrase table is enhanced with the probability distribution of a document over topics. Evaluation shows that our proposed approach significantly improves the BLEU score on Chinese-to-English machine translation.

关键词： SMT LDA Adaptation Document

作者: Gong Zhengxian Zhang Yu Zhou Guodong

作者单位: School of Computer Science and Technology Soochow University Suzhou, China

会议类型: 国际会议

会议名称: 2010 4th International Universal Communication Symposium(第四届国际普遍交流学术研讨会 IUCS 2010)

会议地点: 北京

会议语种:英文

页码: 285-289

在线出版日期: 2010-10-18（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Statistical Machine Translation Based on LDA