会议专题

DESIGN AND IMPLEMENTATION OF TEXT FILTERING WITH NO SEMANTIC ACCIDENTAL INJURY

Information filtering in Internet refers to finding and filtering the bad words in large-scale web text. The accuracy and efficiency are the main problems of concern. The mixture of Chinese and English text filtering is the research emphasis in this paper. The paper proposes a Chinese and English text filtering algorithm-No Semantic Accidental Injury Filter(NSAIF) algorithm to avoid semantic injury. It’s based on Aho-2Corasick (AC) algorithm, but avoids space expansion with dynamic memory allocation. It’s applicative for Chinese and English text using one-byte storage. It uses the longest match principle to find the words should be filtered in the trie augmented with failure pointers. It has the good time and space performance in different size of test data sets and has the high theoretical and practical values.

text filtering AC Chinese and English Semantic accidental injury longest match principle

Danfeng Yan Jia Liu Fangchun Yang

State Key Laboratory of Networking and Switching Technology,Beijing University of Posts and Telecommunications, Beijing 100876, China

国际会议

2011 4th IEEE International Conference on Broadband Network & Multimedia Technology(第四届IEEE宽带网络与多媒体国际会议 4th IEEE IC-BNMT2011)

深圳

英文

61-65

2011-10-28(万方平台首次上网日期,不代表论文的发表时间)