会议专题

Searching BWT against Pattern Matching Machine to Find Multiple String Matches

  In this paper,we discuss an indexing method for solving the multiple string pattern matching problem,by which we are given a set of short strings R = r1,…,rl and required to locate all substrings of a target string s such that each of them matches an rj in R.The main idea is to construct a pattern matching machine A and transform the reverse s of s to a BWT-array as an index,denoted as BWT(s),and search A against it.During the process,the failure function of A is used to decrease the subranges of BWT(s)to be searched at each step.In addition,we change a single-character checking against BWT(s)to a multiple-character checking,by which multiple searches of BWT(s)are reduced to a single scanning.In this way,high efficiency can be achieved.Extensive experiments have been conducted,which shows that our method works better than almost all the existing methods for this problem.

string matching DNA sequences multiple pattern machine automaton BWT-transformation

Yangjun Chen Yujia Wu

Dept.Applied Computer Science,University of Winnipeg,Canada

国际会议

第九届网络分布式计算与知识发现国际会议( 2017 International Conference on Cyber-enabled distributed computing and knowledge discovery)

南京

英文

167-176

2017-10-12(万方平台首次上网日期,不代表论文的发表时间)