Maximum Entropy Combined FSM Stemming Method for Uyghur
This paper presents the generation of Uyghur Noun Suffix DFA combined with Maximum Entropy (MaxEnt) for stemming algorithm. Because of the agglutinative nature of Uyghur language, stemming is an essential task for Uyghur language processing applications. We generate Uyghur noun inflectional suffixes finite state machines (FSMs) by using the morphotactic rules in reverse order. But there are eight suffixes which is similar to the ending part of some words. These suffixes make the FSM ambiguous. We apply the MaxEnt model to resolve ambiguity of the FSM. This paper describes the steps of generating the FSM, building the MaxEnt suffix identifying model and combination of MaxEnt with FSM.
Aishan Wumaier Zaokere Kadeer Parida Tursun Shengwei Tian
School of Information Science and EngineeringXinjiang UniversityUrumqi, Xinjiang, China, 830046 School of Information Science and Engineering Xinjiang University Urumqi, Xinjiang, China, 830046
国际会议
北京
英文
51-55
2009-08-10(万方平台首次上网日期,不代表论文的发表时间)