会议专题

MEMORY-EFFICIENT REGULAR EXPRESSION MATCHING FOR CHINESE NETWORK CONTENT AUDIT

When match against Chinese keyword for network content audit, one of the biggest problems is that there is interference of “noise characters, it makes the traditional way using explicit string pattern to match infeasible. Regular expression matching can solve the problem perfectly, but the DFA-base approaches for regular expression matching will also encounter the problem of excessive memory usage. In this paper, we try to solve the problem encountered when applying regular expression to Chinese network content audit. We propose a regular expression rewriting techniques and grouping principle that can solve excessive memory usage problem in DFA-based approach. Our solution can make it possible to apply regular expression to Chinese network content audit.

I Regular ezpression Chinese keyword matching Network content audit

Zezhi Zhu Ping Lin Luying Chen Kun Zhang

School of Information and Communication Engineering Beijing University of Posts and Telecommunications

国际会议

2009 IEEE International Conference on Network Infrastructure and Digital Content(2009年IEEE网络基础设施与数字内容国际会议 IEEE IC-NIDC2009)

北京

英文

144-148

2009-11-06(万方平台首次上网日期,不代表论文的发表时间)