Utilizing Auditory Masking in Automatic Speech Recognition

摘要：

A speech recognition system based on the psychoa coustics of the masking property of the of human au ditory system is proposed. The method utilizes several psychoacoustic properties of human perception to define perceptual speech excitation function (masking thresh old) and perceptual noise. Based on the auditory mask ing threshold, a time-frequency noise spectral subtrac tion is implemented. For a human listener, the noise below the masking threshold is inaudible, and the ob jective is to minimize only the noise spectrum above the masking threshold. Additionally, we show that, for ASR applications, further improvements in recognition per formance may be obtained by augmenting the masking of the noise by spectral subtraction in the masked re gion also. The strategy is to remove the masked noise from the ASR system, similar to the masking effect in the human auditory system. Based on the AMT, and the esti mated perceptual noise, we have implemented two spec tral subtraction algorithms: a straight-forward scheme of subtracting the total estimated perceptual noise from the noisy speech spectrum, and a spectral subtraction of the noise which lies below the masking threshold. It was observed that, both methods give significant improve ments over the base PLP performance, with the latter method giving better recognition results.

作者: Serajul Haque

作者单位: IEEE

会议类型: 国际会议

会议名称: 第十届中国虚拟现实年会

会议地点: 上海

会议语种:英文

页码: 1758-1764

在线出版日期: 2010-10-20（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Utilizing Auditory Masking in Automatic Speech Recognition