会议专题

FEATURE DISTRIBUTIONS IN EXPONENTIAL LANGUAGE MODELS

Considering of the features distribution but not just the counts of features appearances in sequence makes exponential language models more powerful to capture the global language phenomena. This paper constructs an exponential language model with binary variables distributions of features, and uses minimum sample risk training method to train model by utilizing more features and adjusting their parameters. In this paper we show that the language model trained on Chinese Internet chat corpus, obtains up to 19% sentence correct rate improvement and up to 7.46% Chinese character correct rate improvement when compared to the baseline model.

Ezponential language models binary variable’s distribution minimum sample risk

Huixing Jiang Xiaojie Wang

Center for Intelligence Science and Technology,Beijing University of Posts and Telecommunications, Beijing, China

国际会议

2009 IEEE International Conference on Network Infrastructure and Digital Content(2009年IEEE网络基础设施与数字内容国际会议 IEEE IC-NIDC2009)

北京

英文

252-256

2009-11-06(万方平台首次上网日期,不代表论文的发表时间)