A Novel Interpolated N-gram Language Model Based on Class Hierarchy

摘要：

In this paper, we propose a novel interpolated language model that combines the interpolation and the backing-off along hierarchical classes based on class hierarchy. And the corresponding approach to the estimation of interpolation coefficients is also presented. We use the Minimum Discriminative Information (MDI) method to cluster the vocabulary into a word-clustering tree hierarchically. The tree is used to balance the generalization ability of classes’ and word specificity when estimating the likelihood of a n-gram event. Experiments are performed on Reuter’s corpus using a vocabulary of 27,000 words. Results show a reduction on the test perplexity over the standard Modified KN n-gram approach by 12%.

关键词： Language model class hierarchy cluster interpolate back-off

作者: Zhenyu Lv Wenju Liu Zhanlei Yang

作者单位: Institute of Automation Chinese Academy of Sciences Beijing, China

会议类型: 国际会议

会议名称: International Conference on Natural Language Processing and Knowledge Engineering(IEEE自然语言处理与知识工程国际会议 IEEE NLP-KE 2009)

会议地点: 大连

会议语种:英文

页码: 1-5

在线出版日期: 2009-09-24（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A Novel Interpolated N-gram Language Model Based on Class Hierarchy