会议专题

The Structure of Unseen Trigrams and its Application to Language Models: a First Investigation

In a series of preparatory experiments in 4 languages on subsets of the Europarl corpus, we show that a large number of unseen trigrams can be reconstructed by proportional analogy with trigrams having the lowest frequencies. We derive a very simple smoothing scheme from this empirical result and show that it outperforms Good-luring and KneserNey smoothing schemes on trigrams models in all 11 languages on the common multilingual part of the Europarl corpus, except Finnish.

Index Terms-Trigram language models structure of unseen trigrams Europarl

Yves Lepage Julien Gosme Adrien Lardilleux

IPS Graduate School, Waseda University Kita-Kyushu, 808-0135, Japan GREYC, universite de Caen Basse-Normandie F-14032, Caen, France

国际会议

2010 4th International Universal Communication Symposium(第四届国际普遍交流学术研讨会 IUCS 2010)

北京

英文

272-279

2010-10-18(万方平台首次上网日期,不代表论文的发表时间)