会议专题

A Robust Transcription System for Soccer Video Database

This paper presents a robust approach for the transcription of soccer video database. By exploiting audio channels in the video, spoken information is transcribed using a canonical speech recognition system. Since soccer videos vary in both speech quality and content, the transcription system is posed with three main problems: noisy data, foreign term interferences, and emotional variations in speech prosody. Three solutions are proposed to each of the problems respectively: a noise reduction scheme, a cross-lingual transliteration model, and an advanced acoustic modeling technique. Experimental evaluations of the proposed methods are conducted on the Vietnamese AFF Suzuki-cup database consisting of over 14-hour video. In the best case, system performance reaches 83.3% accuracy rate.

Nhut M. Pham Duc A. Duong Quan H. Vu

University of Science, VNU-HCM, Vietnam

国际会议

第十届中国虚拟现实年会

上海

英文

1066-1072

2010-10-20(万方平台首次上网日期,不代表论文的发表时间)