会议专题

Application of Word Segment Based on Suffix Array in Web Text Mining

Word segmentation technology is the basis of web text mining, but the more and more new words have seriously affected the performance of word segmentation on the internet New word recognition rate is inefficient. To solve this problem in Web text mining, suffix array is proposed and realized in this paper, it is established for word segmentation, and the number of length of common prefix can be calculated .The candidates are filtered out by compared with the threshold. Thus, the automatic segmentation of documentation can be achieved. The results show that this method has advantages in the new word recognition of Web text mining.

Suffix Array Chinese Word Segment Web Text Mining

Zhang Qiuhong Su Jin Yang Xinfeng Ren Xueli

Department of Computer Science and Technology Nanyang Institute of Technology Nanyang, China Department of Computer Science and Engineering Qujing Normal University Qujing, China

国际会议

2011 3rd International Conference on Computer Engineering and Applications(2011第三届计算机工程与应用国际会议 ICCEA2011)

海口

英文

636-638

2011-07-15(万方平台首次上网日期,不代表论文的发表时间)