Transcribing Southern Min Speech Corpora with a Web-Based Language Learning System

摘要：

The paper proposes a human-computation-based scheme for transcribing Southern Min speech corpora. The core idea is to implement a Web-based language learning system to collect orthographic and phonetic labels from a large amount of language learners and choose the commonly input labels as the transcriptions of the corpora. It is essentially a technology of distributed knowledge acquisition. Some computer-aided mechanisms are also used to verify the collected transcriptions. The benefit of the scheme is that it makes the transcribing task neither tedious nor costly. No significant budget should be made for transcribing large corpora. The design of a system for transcribing Min Nan speech corpora is described in detail. The application of a prototype version of the system shows that this transcribing scheme is an effective and economical way to generate orthographic and phonetic transcriptions.

作者: Jun Cai Jacques Feldmar Yves Laprie Dominique Fohr Jean-Paul Haton

作者单位: Groupe Parole, LORIA-CNRS & INRIA, BP 239, 54600 Vandoeuvre-les-Nancy, France Dept.of Cognitive Scie Groupe Parole, LORIA-CNRS & INRIA, BP 239, 54600 Vandoeuvre-les-Nancy, France

会议类型: 国际会议

会议名称: 2008 International Conference on Audio，Language and Image Processing(2008国际声音、语言、图像过程大会)

会议地点: 镇江

会议语种:英文

页码: 659-664

在线出版日期: 2008-07-07（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Transcribing Southern Min Speech Corpora with a Web-Based Language Learning System