Eztended multiple breakpoint graphs and a fast and ezact DCJ median solver for linear genomes
The median problem lies at the heart of rearrangement-based phylogenetics; given a set of genomes G and a genomic distance measure d,it asks for a genome q that minimizes the total distance ∑I∈gdq. I. The median problem under the double-cut-andjoint (DCJ) distance 1 is known to be NP-hard 2,3. In our previous work on circular genomes 4,5, modeling them with multiple breakpoint graphs, we developped a graph-based recursive decomposition method, using so-called adequate subgraphs. The recursive decomposition provides a dramatic reduction in the size of the solution space. We designed an algorithm, ASMedian,to use this approach and tested it on simulated data, demonstrating speedups by at least six orders of magnitude over naive algorithms.When we turn to linear genomes, the problem arises of the cap assignment: how to determine the correspondence of telomeres between different genomes. The capping optimization problem-how to choose the cap assignment that leads to the smallest total genomic distance-is difficult: with just three genomes of k linear chromosomes each, its solution space is of size (2k)!. (2k)!. In this poster, we first extend multiple breakpoint graphs to model linear genomes, following the idea in 6, by merging all caps into one cap node. We apply our recursive decomposition method to this extended multiple breakpoint graph, thereby bypassing the capping optimization procedure, because adequate subgraphs remain usable in the parts of the multiple breakpoint graph that do not include the cap node. We also find cap-related adequate subgraphs with the largest occurrence probability to further improve the efficiency of this decomposition method. In tests run on simulated data, our algorithm works quickly under a wide range of parameter values, thus extending our previous success with circular chromosomes to linear chromosomes. It should be noted, however, that the average running time increases quickly with the number of linear chromosomes in each genome, indicating that our algorithm does not completely eschew all issues involved with capping optimization.
Andrew Wei Xu
School of Computer and Communication Sciences Swiss Federal institute of Technology (EPFL) EPFL IC LCBB INJ 231, Station 14 CH-1015 Lausanne, Switzerland
国际会议
The 7th Asia-Pacific Bioinformatics Conference(第七届亚太生物信息学大会)
北京
英文
912
2009-01-01(万方平台首次上网日期,不代表论文的发表时间)