Automatic Speech Sentence Segmentation from Multi-paragraph Databases*
Speech sentence is the input of automatic phonetic segmentation or ranscription. This paper discusses our efforts on automatic speech sentence segmentation from multi-parag raph speech databases for building Text-To-Speech (TTS)system speech corpus automatically. We present a) a system of automatic speech sentence segmentation from broadcasting audio based on forced alignment technique, in which a checking Mechanism based on speech recognition technique is also used, b) an iterative algorithm to improve the system, c) a music detector based on a scheme combination of Variable Duration Hidden Markov Model (VDHMM) and Gaussian Mixture Model (GMM). Experiments show that the improved system has 98.93% of Sentence Accurate Rate (SAR) and generates 646 correct sentences, compared with 97.85% of SAR, and 155 correct sentences in original system.
speech process automatic speech sentence segmentation multi-paragraph speech database
ZHANG Wei PANG Minhui DU Ranran LIU Yayu
Department of Computer Science and technology Ocean University of China Qingdao 266100, China
国际会议
长沙
英文
721-724
2010-03-13(万方平台首次上网日期,不代表论文的发表时间)