Thai Sentence-Breaking for Large-Scale SMT
Thai language text presents challenges for integration into large-scale multi- language statistical machine translation (SMT) systems, largely stemming from the nominal lack of punctuation and in- ter-word space. For Thai sentence break- ing, we describe a monolingual maxi- mum entropy classifier with features that may be applicable to other languages such as Arabic, Khmer and Lao. We ap- ply this sentence breaker to our large- vocabulary, general-purpose, bidirec- tional Thai-English SMT system, and achieve BLEU scores of around 0.20, reaching our threshold of releasing it as a free online service.
Glenn Slayden Mei-Yuh Hwang Lee Schwartz
thai-language.com Microsoft Research
国际会议
The 23rd International Conference on Computational Linguistics(第23届国际计算语言学大会)
北京
英文
8-16
2010-08-01(万方平台首次上网日期,不代表论文的发表时间)