Joint Tokenization, Parsing, and Translation

摘要：

Natural language processing is all about ambiguities. In machine translation, tokenization and parsing mistakes due to segmentation and structural ambiguities potentially introduce translation errors. A well-known solution is to provide more alternatives by using compact representations such as lattice and forest. In this talk, I will introduce a technique that goes beyond using lattices and forests, which integrates tokenization, parsing, and translation in one system. Therefore, tokenization, parsing, and translation can interact with and benefit each other in a discriminative framework. Experimental results show that such integration significantly improves tokenization and translation performance.

作者: Yang Liu

作者单位: Institute of Computing Technology (ICT), Chinese Academy of Sciences

会议类型: 国际会议

会议名称: 2010 4th International Universal Communication Symposium(第四届国际普遍交流学术研讨会 IUCS 2010)

会议地点: 北京

会议语种:英文

页码: 1

在线出版日期: 2010-10-18（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Joint Tokenization, Parsing, and Translation