FEATURE DISTRIBUTIONS IN EXPONENTIAL LANGUAGE MODELS
Considering of the features distribution but not just the counts of features appearances in sequence makes exponential language models more powerful to capture the global language phenomena. This paper constructs an exponential language model with binary variables distributions of features, and uses minimum sample risk training method to train model by utilizing more features and adjusting their parameters. In this paper we show that the language model trained on Chinese Internet chat corpus, obtains up to 19% sentence correct rate improvement and up to 7.46% Chinese character correct rate improvement when compared to the baseline model.
Ezponential language models binary variable’s distribution minimum sample risk
Huixing Jiang Xiaojie Wang
Center for Intelligence Science and Technology,Beijing University of Posts and Telecommunications, Beijing, China
国际会议
北京
英文
252-256
2009-11-06(万方平台首次上网日期,不代表论文的发表时间)