会议专题

ASR Normalization for Machine Translation

In natural spoken language there are many meaningless modal particles and dittographes, furthermore ASR (automatic speech recognition) often has some recognition errors and the ASR results have no punctuations. Therefore, the translation would be rather poor if the ASR results are directly translated by MT (machine translation). In this paper, an ASR normalization approach was introduced for machine translation which based on maximum entropy sequential labeling model. Before translation, the meaningless modal particles and dittograph were deleted, and the recognition errors were corrected, and ASR results were also punctuated. Experiments show that the MT BLEU of 0.2465 is obtained, that improved by 17.3% over the MT baseline without normalization. The positive experimental results confirm that ASR normalization is effective for improvement of translation quality for spoken language machine translation.

Spoken language machine translation automatic speech recognition maximum entropy model normalization

Heyan Huang Chong Feng Jiande Wang Xiaofei Zhang

School of Computer Science and Technology,Beijing Institute of Technology,Beijing,China,100081 Research Center of Computer and Language Information Engineering,CAS,Beijing,China,100097

国际会议

2010 Second International Conference on Intelligent Human-Machine Systems and Cybernetics(第二届智能人机系统与控制论国际学术会议 IHMSC 2010)

南京

英文

430-433

2010-08-26(万方平台首次上网日期,不代表论文的发表时间)