An Improved Forward Maximum Matching Algorithm for Chinese Word Segmentation
With the original Forward Maximum Matching algorithm (FMM), the initial value of the maximum word length is a constant As a result, long Chinese words cannot be segmented correctly and matched repeatedly. This paper presents an improved way for FMM to dynamically adjust the maximum word length. In addition, a new Chinese word segmentation lexicon is introduced which would work corporately with the improved algorithm. Compare with the original FMM, the improved algorithm dramatically reduces the number of times each segment is being matched. Furthermore, analysis has shown obvious improvement in both speed and efficiency for Chinese word segmentation.
forward maximum matching Chinese word segmentation lexicon structure Hash table
Jing Luan Ruilei Wang Xiupei Lu
College of the Computer Science and Technology Xinjiang Normal University 102 Xinyi Road, Urumqi, Ch College of the Computer Science and Technology Xinjiang Normal University 102 Xinyi Road,Urumqi,Chin
国际会议
2010 International Conference on Future Information Technology(2010年未来信息技术国际会议 ICFIT 2010)
长沙
英文
991-994
2010-12-14(万方平台首次上网日期,不代表论文的发表时间)