FEATURE DISTRIBUTIONS IN EXPONENTIAL LANGUAGE MODELS

摘要：

Considering of the features distribution but not just the counts of features appearances in sequence makes exponential language models more powerful to capture the global language phenomena. This paper constructs an exponential language model with binary variables distributions of features, and uses minimum sample risk training method to train model by utilizing more features and adjusting their parameters. In this paper we show that the language model trained on Chinese Internet chat corpus, obtains up to 19% sentence correct rate improvement and up to 7.46% Chinese character correct rate improvement when compared to the baseline model.

关键词： Ezponential language models binary variable’s distribution minimum sample risk

作者: Huixing Jiang Xiaojie Wang

作者单位: Center for Intelligence Science and Technology,Beijing University of Posts and Telecommunications, Beijing, China

会议类型: 国际会议

会议名称: 2009 IEEE International Conference on Network Infrastructure and Digital Content(2009年IEEE网络基础设施与数字内容国际会议 IEEE IC-NIDC2009)

会议地点: 北京

会议语种:英文

页码: 252-256

在线出版日期: 2009-11-06（万方平台首次上网日期，不代表论文的发表时间）

会议专题

FEATURE DISTRIBUTIONS IN EXPONENTIAL LANGUAGE MODELS