A Telephone Speech Corpus of Chinas Minority languages for Automatic Language Identification
Research in language identification require corpus of multi-languages speech data to capture the distinguishable information within and across languages.In the past few decades,many statistical approaches to language identification have been developed based on two common and public-domain corpora which consist of telephone speech from about 26 languages and dialects.However,the Chinas minority languages have not been used as the target languages in the published papers up to now.In our work,we select 9 typical Chinas minority languages and Mandarin to construct our telephone speech corpus.These minority languages are composed of Naxi,Miao,Bai,Dai,Yi,Zhuang,Uygur language,Mongolian and Tibetan.Each minority language represents its minority nationality.The corpus can be used to study,develop,evaluate and compare minority languages identification algorithms.Moreover,it will promote the Linguistic researchers to pay more attention to the long history and splendid culture of our national minorities.
language identification telephone speech corpus Minority languages
Xiuhua Zeng Jian Yang Libo Zuo Yonghua Xu
School of Physics and Electronic Engineering Qujing Normal University,Qujing,655011,China School of Information Science and Engineering Yunnan University,Kunming,Yunnan,650091,China
国际会议
秦皇岛
英文
79-82
2010-11-05(万方平台首次上网日期,不代表论文的发表时间)