会议专题

A Telephone Speech Corpus of Chinas Minority languages for Automatic Language Identification

Research in language identification require corpus of multi-languages speech data to capture the distinguishable information within and across languages.In the past few decades,many statistical approaches to language identification have been developed based on two common and public-domain corpora which consist of telephone speech from about 26 languages and dialects.However,the Chinas minority languages have not been used as the target languages in the published papers up to now.In our work,we select 9 typical Chinas minority languages and Mandarin to construct our telephone speech corpus.These minority languages are composed of Naxi,Miao,Bai,Dai,Yi,Zhuang,Uygur language,Mongolian and Tibetan.Each minority language represents its minority nationality.The corpus can be used to study,develop,evaluate and compare minority languages identification algorithms.Moreover,it will promote the Linguistic researchers to pay more attention to the long history and splendid culture of our national minorities.

language identification telephone speech corpus Minority languages

Xiuhua Zeng Jian Yang Libo Zuo Yonghua Xu

School of Physics and Electronic Engineering Qujing Normal University,Qujing,655011,China School of Information Science and Engineering Yunnan University,Kunming,Yunnan,650091,China

国际会议

2010 4th International Conference on Intelligent Information Techonlogy Application(第四届智能信息技术应用国际学术研讨会 IITA 2010)

秦皇岛

英文

79-82

2010-11-05(万方平台首次上网日期,不代表论文的发表时间)