会议专题

Maximum Entropy Combined FSM Stemming Method for Uyghur

This paper presents the generation of Uyghur Noun Suffix DFA combined with Maximum Entropy (MaxEnt) for stemming algorithm. Because of the agglutinative nature of Uyghur language, stemming is an essential task for Uyghur language processing applications. We generate Uyghur noun inflectional suffixes finite state machines (FSMs) by using the morphotactic rules in reverse order. But there are eight suffixes which is similar to the ending part of some words. These suffixes make the FSM ambiguous. We apply the MaxEnt model to resolve ambiguity of the FSM. This paper describes the steps of generating the FSM, building the MaxEnt suffix identifying model and combination of MaxEnt with FSM.

Aishan Wumaier Zaokere Kadeer Parida Tursun Shengwei Tian

School of Information Science and EngineeringXinjiang UniversityUrumqi, Xinjiang, China, 830046 School of Information Science and Engineering Xinjiang University Urumqi, Xinjiang, China, 830046

国际会议

2009 Oriental COCOSDA International Conference on Speech Database and Assessments(2009 国际语音交互标准数据评估技术大会)

北京

英文

51-55

2009-08-10(万方平台首次上网日期,不代表论文的发表时间)