Monaural Voiced Speech Segregation based on Combined Cues and Energy Distribution
Monaural speech segregation is important for speech signal processing, and it has been extensively studied on the basis of auditory scene analysis principles. However, current segregation algorithms can not achieve satisfactory performance in high frequency range. In this paper, we propose a system for monaural voiced speech segregation, in which two novel ideas are investigated. First, combined cues (including cross-channel correlation, temporal continuity, and onset/offset) are employed to generate segments in high frequency range. Second, the energy distribution of mixed signal is employed to indicate the reliabilities of cues in high frequency range, according to which, an alternative segmentation strategy is performed. Systematic evaluation and comparison show that the proposed system produces improvement on SNR gain.
Liheng Zhao Zengfu Wang
University of Science and Technology of China Department of Automation Hefei 230027, P. R. China
国际会议
上海
英文
57-63
2010-10-20(万方平台首次上网日期,不代表论文的发表时间)