会议专题

Utilizing Auditory Masking in Automatic Speech Recognition

A speech recognition system based on the psychoa coustics of the masking property of the of human au ditory system is proposed. The method utilizes several psychoacoustic properties of human perception to define perceptual speech excitation function (masking thresh old) and perceptual noise. Based on the auditory mask ing threshold, a time-frequency noise spectral subtrac tion is implemented. For a human listener, the noise below the masking threshold is inaudible, and the ob jective is to minimize only the noise spectrum above the masking threshold. Additionally, we show that, for ASR applications, further improvements in recognition per formance may be obtained by augmenting the masking of the noise by spectral subtraction in the masked re gion also. The strategy is to remove the masked noise from the ASR system, similar to the masking effect in the human auditory system. Based on the AMT, and the esti mated perceptual noise, we have implemented two spec tral subtraction algorithms: a straight-forward scheme of subtracting the total estimated perceptual noise from the noisy speech spectrum, and a spectral subtraction of the noise which lies below the masking threshold. It was observed that, both methods give significant improve ments over the base PLP performance, with the latter method giving better recognition results.

Serajul Haque

IEEE

国际会议

第十届中国虚拟现实年会

上海

英文

1758-1764

2010-10-20(万方平台首次上网日期,不代表论文的发表时间)