Automatic Language Identification Using Support Vector Machines and Phonetic N-gram

摘要：

In this paper, we describe two approaches for language identification (LID) using support vector machines (SVM) and phonetic n-gram. One is to use the language model scores of phone sequences to do SVM training. The other is to use the n-gram probabilities of those phones to train SVM models. For the second approach, we propose a new effective normalization method. In the experiments of 30s test for 5 languages, our new normalization method shows a relative reduction of 15.8% in terms of equal error rate (EER) compared with the traditional one. And it makes the system using the second approach reaches an EER of 2.4%, a relative reduction of about 35.5% in comparison with the first one. Details of implementation and experimental results are presented in this paper.

作者: Yan Deng Jia Liu

作者单位: Tsinghua National Laboratory for Information Science and Technology Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

会议类型: 国际会议

会议名称: 2008 International Conference on Audio，Language and Image Processing(2008国际声音、语言、图像过程大会)

会议地点: 镇江

会议语种:英文

页码: 71-74

在线出版日期: 2008-07-07（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Automatic Language Identification Using Support Vector Machines and Phonetic N-gram