Automatic Speech Sentence Segmentation from Multi-paragraph Databases*

摘要：

Speech sentence is the input of automatic phonetic segmentation or ranscription. This paper discusses our efforts on automatic speech sentence segmentation from multi-parag raph speech databases for building Text-To-Speech (TTS)system speech corpus automatically. We present a) a system of automatic speech sentence segmentation from broadcasting audio based on forced alignment technique, in which a checking Mechanism based on speech recognition technique is also used, b) an iterative algorithm to improve the system, c) a music detector based on a scheme combination of Variable Duration Hidden Markov Model (VDHMM) and Gaussian Mixture Model (GMM). Experiments show that the improved system has 98.93％ of Sentence Accurate Rate (SAR) and generates 646 correct sentences, compared with 97.85％ of SAR, and 155 correct sentences in original system.

关键词： speech process automatic speech sentence segmentation multi-paragraph speech database

作者: ZHANG Wei PANG Minhui DU Ranran LIU Yayu

作者单位: Department of Computer Science and technology Ocean University of China Qingdao 266100, China

会议类型: 国际会议

会议名称: 2010 International Conference on Measuring Technology and Mechatronics Automation(ICMTMA 2010)(2010年检测技术与机电自动化国际会议)

会议地点: 长沙

会议语种:英文

页码: 721-724

在线出版日期: 2010-03-13（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Automatic Speech Sentence Segmentation from Multi-paragraph Databases*