Affix-Augmented Stem-Based Language Model for Persian

摘要：

Language modeling is used in many NLP applications like machine translation, POS tagging, speech recognition and information retrieval. It assigns a probability to a sequence of words. This task becomes a challenging problem for high inflectional languages. In this paper we investigate standard statistical language models on the Persian as an inflectional language. We propose two variations of morphological language models that rely on a morphological analyzer to manipulate the dataset before modeling. Then we discuss shortcoming of these models, and introduce a novel approach that exploits the structure of the language and produces more accurate. Experimental results are encouraging especially when we use n-gram models with small training dataset.

关键词： Tracking language model n-gram morphological Persian

作者: Heshaam FAILI Hadi RAVANBAKHSH

作者单位: Dept. ECE, University of Tehran Tehran, Iran

会议类型: 国际会议

会议名称: The 6th International Conference on Natural Language Processing and Knowledge Engineering(第六届IEEE自然语言处理与知识工程国际会议 NLP-KE 2010)

会议地点: 北京

会议语种:英文

页码: 1-4

在线出版日期: 2010-08-21（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Affix-Augmented Stem-Based Language Model for Persian