ASR Normalization for Machine Translation

摘要：

In natural spoken language there are many meaningless modal particles and dittographes, furthermore ASR (automatic speech recognition) often has some recognition errors and the ASR results have no punctuations. Therefore, the translation would be rather poor if the ASR results are directly translated by MT (machine translation). In this paper, an ASR normalization approach was introduced for machine translation which based on maximum entropy sequential labeling model. Before translation, the meaningless modal particles and dittograph were deleted, and the recognition errors were corrected, and ASR results were also punctuated. Experiments show that the MT BLEU of 0.2465 is obtained, that improved by 17.3％ over the MT baseline without normalization. The positive experimental results confirm that ASR normalization is effective for improvement of translation quality for spoken language machine translation.

关键词： Spoken language machine translation automatic speech recognition maximum entropy model normalization

作者: Heyan Huang Chong Feng Jiande Wang Xiaofei Zhang

作者单位: School of Computer Science and Technology,Beijing Institute of Technology,Beijing,China,100081 Research Center of Computer and Language Information Engineering,CAS,Beijing,China,100097

会议类型: 国际会议

会议名称: 2010 Second International Conference on Intelligent Human-Machine Systems and Cybernetics(第二届智能人机系统与控制论国际学术会议 IHMSC 2010)

会议地点: 南京

会议语种:英文

页码: 430-433

在线出版日期: 2010-08-26（万方平台首次上网日期，不代表论文的发表时间）

会议专题

ASR Normalization for Machine Translation