Translation Table Compression under End-Tagged Dense Code
In recent years, the quality of PhraseBased Statistical Machine Translation has increased dramatically partially due to the significant increase of available parallel corpus. If we talk in terms of space, this advantage becomes a disadvantage because the increased size of the parallel corpus implies an exponential increase in the size of the translation tables. To solve this problem, there are solutions that reduce the size of the translation tables limiting the length of sentences that are incorporated into the tables. This solution reduces the space, but at the expense of increasing the possibility of worsening the translation of long sentences. In this paper, we propose the compression of the phrase-based translation tables using EndTagged Dense Code to codify the phrases in source and target languages. The use of this technique allows us to reduce the size of translation tables and therefore it is possible to add longer sentences.
Tito Valencia Lorena O. Cerdeira Eva L. Iglesias Francisco J. Rodriguez
Dept. of Computer Science University of Vigo Ourense, Spain
国际会议
2010 4th International Universal Communication Symposium(第四届国际普遍交流学术研讨会 IUCS 2010)
北京
英文
305-310
2010-10-18(万方平台首次上网日期,不代表论文的发表时间)