A Telephone Speech Corpus of Chinas Minority languages for Automatic Language Identification

摘要：

Research in language identification require corpus of multi-languages speech data to capture the distinguishable information within and across languages.In the past few decades,many statistical approaches to language identification have been developed based on two common and public-domain corpora which consist of telephone speech from about 26 languages and dialects.However,the Chinas minority languages have not been used as the target languages in the published papers up to now.In our work,we select 9 typical Chinas minority languages and Mandarin to construct our telephone speech corpus.These minority languages are composed of Naxi,Miao,Bai,Dai,Yi,Zhuang,Uygur language,Mongolian and Tibetan.Each minority language represents its minority nationality.The corpus can be used to study,develop,evaluate and compare minority languages identification algorithms.Moreover,it will promote the Linguistic researchers to pay more attention to the long history and splendid culture of our national minorities.

关键词： language identification telephone speech corpus Minority languages

作者: Xiuhua Zeng Jian Yang Libo Zuo Yonghua Xu

作者单位: School of Physics and Electronic Engineering Qujing Normal University,Qujing,655011,China School of Information Science and Engineering Yunnan University,Kunming,Yunnan,650091,China

会议类型: 国际会议

会议名称: 2010 4th International Conference on Intelligent Information Techonlogy Application(第四届智能信息技术应用国际学术研讨会 IITA 2010)

会议地点: 秦皇岛

会议语种:英文

页码: 79-82

在线出版日期: 2010-11-05（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A Telephone Speech Corpus of Chinas Minority languages for Automatic Language Identification