Unconstrained Many-to-Many Alignment for Automatic Pronunciation Annotation
An alignment between graphemes and phonemes is vital data to annotate the pronunciation for out-of-vocabulary words. We desire an alignment to be (1) many-to-many and (2) fine-grained. A traditional one-to-one alignment model does not represent an intuitive mapping for logograms, such as Chinese characters, and has previously reported an inferior performance in phoneme prediction. A conventional many-to-many alignment model prefers a mapping consisting of longer substrings, which degrades the generalization ability of the prediction model, especially for out-of-vocabulary words. In order to obtain a highly generalized model, we introduce city block distance in the conventional many-to-many alignment, so that fine-grained mappings are inferred without constraining the maximum lengths of both graphemes and phonemes. Experimental results show that our extension improves the baseline grapheme-to-phoneme conversion on several language data sets.
Keigo Kubo Hiromichi Kawanami Hiroshi Saruwatari Kiyohiro Shikano
Graduate School of Information Science, Nara Institute of Science and Technology, Japan
国际会议
2011亚太信号与信息处理协会年度峰会(APSIPAASC 2011)
西安
英文
1-4
2011-10-18(万方平台首次上网日期,不代表论文的发表时间)