DESIGN AND IMPLEMENTATION OF TEXT FILTERING WITH NO SEMANTIC ACCIDENTAL INJURY
Information filtering in Internet refers to finding and filtering the bad words in large-scale web text. The accuracy and efficiency are the main problems of concern. The mixture of Chinese and English text filtering is the research emphasis in this paper. The paper proposes a Chinese and English text filtering algorithm-No Semantic Accidental Injury Filter(NSAIF) algorithm to avoid semantic injury. It’s based on Aho-2Corasick (AC) algorithm, but avoids space expansion with dynamic memory allocation. It’s applicative for Chinese and English text using one-byte storage. It uses the longest match principle to find the words should be filtered in the trie augmented with failure pointers. It has the good time and space performance in different size of test data sets and has the high theoretical and practical values.
text filtering AC Chinese and English Semantic accidental injury longest match principle
Danfeng Yan Jia Liu Fangchun Yang
State Key Laboratory of Networking and Switching Technology,Beijing University of Posts and Telecommunications, Beijing 100876, China
国际会议
深圳
英文
61-65
2011-10-28(万方平台首次上网日期,不代表论文的发表时间)