会议专题

EULER: From Assembling Short DNA Reads to Shotgun Protein Sequencing by Assembling Mass Spectra

At present the prokaryote taxonomy is largely based on 16S rRNA phylogeny. There is an urCurrent sequencing technologies attempt to maximize read length without sacrificing basecalling accuracy. Since fragment assembly with inaccurate reads is difficult, the common practice is to trim the inaccurate tails of the reads. We present a new computational technique EULER-USR for assembling short reads that bypasses the problem of limited accuracy in the tails of reads. An important and counterintuitive implication of this result is that one may extend sequencing reactions past their prime to where the error rate grows above what is normally acceptable for fragment assembly. We compare EULER-USR with other short read assemblers and illustrate its applications for assembling mate-paired Illumina reads from bacterial genomes. We further address the problem of sequencing molecules that are not directlyinscribed in the genomes (e.g., antibodies or antibiotics-like non-ribosomal peptides) and propose to assemble them from tandem mass spectra. We show that our Eulerian approach to DNA sequencing can be generalized to Shotgun Protein Sequencing (SPS). We illustrate applications of SPS to de novo sequencing of antibodies (collaboration with Jennie Lill at Genentech). We further show how multistage mass-spectrometry enables high-throughput de novo sequencing of peptide-like natural products. This is a joint work with Mark Chaisson, Dima Brinza (DNA fragment assembly), Nuno Bandeira (protein sequencing), Julio Ng and Pieter Dorrestein (sequencing of natural products).

Pavel Pevzner

Computer Science, Department of Computer Science, Center for Algorithmic and Systems Biology, University of California at San Diego, USA

国际会议

The 7th Asia-Pacific Bioinformatics Conference(第七届亚太生物信息学大会)

北京

英文

10

2009-01-01(万方平台首次上网日期,不代表论文的发表时间)