会议专题

Automatic Language Identification Using Support Vector Machines and Phonetic N-gram

In this paper, we describe two approaches for language identification (LID) using support vector machines (SVM) and phonetic n-gram. One is to use the language model scores of phone sequences to do SVM training. The other is to use the n-gram probabilities of those phones to train SVM models. For the second approach, we propose a new effective normalization method. In the experiments of 30s test for 5 languages, our new normalization method shows a relative reduction of 15.8% in terms of equal error rate (EER) compared with the traditional one. And it makes the system using the second approach reaches an EER of 2.4%, a relative reduction of about 35.5% in comparison with the first one. Details of implementation and experimental results are presented in this paper.

Yan Deng Jia Liu

Tsinghua National Laboratory for Information Science and Technology Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

国际会议

2008 International Conference on Audio,Language and Image Processing(2008国际声音、语言、图像过程大会)

镇江

英文

71-74

2008-07-07(万方平台首次上网日期,不代表论文的发表时间)