Speaker Recognition through Nonstationary Vector AR Model
In the mean Mel cepstrum, which is obtained from analysis of speech signals by Fourier transform, the time-varying characteristic frequencies are extracted, two of which are selected to established time series composed by the characteristic frequency Mel cepstrum value series. Using the methods of time series pretreatment and mathematical statistics, their deterministic component and stochastic component are separated. Binary time series are composed of the two stochastic components. In order to further extract the speakers speech signal parameters, time-varying parameter vector AR (TVPVAR) model is established and analyzed. Using these parameters, the speakers are identified based on both stochastic components and the residuals of TVPVAR model. Speeches of 10 speakers, 100 speeches per speaker, are sampled, from which a speech is selected in turn to be recognized. Experiments show that: compared with the recognition rate (98.7%) based on the stochastic component, the recognition rate (99.6%) based on the residuals of TVPVAR model has improved. It proves that the TVPVAR model is effective to analyze autocovariance nonstationary vector time series.
nonstationary time Series TVPVAR model speaker recognition mahalanobis distance
Xingxing Lu Wanchun Fei
College of Textile and Clothing Engineering, Soochow University, Suzhou, 215006, China College of Textile and Clothing Engineering, Soochow University, Suzhou, 215006, China National Engi
国际会议
上海
英文
465-469
2011-07-26(万方平台首次上网日期,不代表论文的发表时间)