Searching BWT against Pattern Matching Machine to Find Multiple String Matches
In this paper,we discuss an indexing method for solving the multiple string pattern matching problem,by which we are given a set of short strings R = r1,…,rl and required to locate all substrings of a target string s such that each of them matches an rj in R.The main idea is to construct a pattern matching machine A and transform the reverse s of s to a BWT-array as an index,denoted as BWT(s),and search A against it.During the process,the failure function of A is used to decrease the subranges of BWT(s)to be searched at each step.In addition,we change a single-character checking against BWT(s)to a multiple-character checking,by which multiple searches of BWT(s)are reduced to a single scanning.In this way,high efficiency can be achieved.Extensive experiments have been conducted,which shows that our method works better than almost all the existing methods for this problem.
string matching DNA sequences multiple pattern machine automaton BWT-transformation
Yangjun Chen Yujia Wu
Dept.Applied Computer Science,University of Winnipeg,Canada
国际会议
南京
英文
167-176
2017-10-12(万方平台首次上网日期,不代表论文的发表时间)