会议专题

Best Fitting Fized-Length Substring Patterns for a Set of Strings

Finding a pattern, or a set of patterns that best characterizes a set of strings is considered important in the context of Knowledge Discovery as applied in Molecular Biology. Our main objective is to address the problem of over-generalization, which is the phenomenon that a characterization is so general that it potentially includes many incorrect examples. To overcome this we formally define a criteria for a most fitting language for a set of strings, via a natural notion of density. We show how the problem can be solved by solving the membership problem and counting problem, and we study the runtime complexities of the problem with respect to three solution spaces derived from unions of the languages generated from fixed-length substring patterns. Two of these we show to be solvable in time polynomial to the input size. In the third case, however, the problem turns out to be NP-complete.

Hirotaka Ono Yen Kaow Ng

Kyushu University, Department of Computer Science and Communication Engineering, 6-10-1, Hakozaki, P Kyushu Institute of Technology, Graduate School of Computer Science and Systems Engineering, Iizuka,

国际会议

The 11th Annual International Computing and Combinatorics Conference COCOON 2005(第11届国际计算和组合会议)

昆明

英文

240-250

2005-08-01(万方平台首次上网日期,不代表论文的发表时间)