会议专题

The influenza virus resource at the national center for biotechnology information

The number of influenza virus sequences in public databases has increased considerably in recent years, thanks to efforts such as the Influenza Genome Sequencing Project, funded by the National Institute of Allergy and Infectious Diseases. The increasing number of sequences created two needs: special databases that are easy to search with virus-specific criteria, and fast sequence analysis tools that can handle large number of sequences. To address these needs, in 2004 the National Center for Biotechnology Information (NCBI) established the Influenza Virus Resource(http://www.ncbi.nlm.nih.gov/genomes/FLU)11. The resource builds upon influenza virus nucleotide, protein and coding region sequences contained in GenBank/EMBL/DDBJ. Every sequence in the influenza database is eurated by an automated procedure, and NCBI staff ensure that information presented is complete,accurate, and up-to-date. Sequences can easily be searched and sorted by properties associated with viruses. Search features such asFull-length sequences only and Remove identical sequences can greatly reduce the size of dataset to be analyzed without losing important information. Complete genome sets of influenza viruses can also be retrieved to study reassortment among different virus isolates. Nucleotide, protein and coding region sequences can be downloaded for further analysis. The database is freely available and has been used as the foundation for several other influenza virus sequence databases. The Influenza Virus Resource provides sequence analysis tools that are integrated with the database, such as multiple sequence alignment and phylogenetic trees of protein or coding region sequences based on different metrics. Users can include their own sequences in these analyses, allowing them to quickly modify a dataset to optimize the analysis and obtain preliminary results. To accommodate more sequences in a tree, an adaptive approach was used to present an aggregated tree, which can be easily manipulated by users. Sequences on the tree can be searched by the fields in the database, and the resulting sequences or groups will be highlighted.An influenza virus genome annotation tool is included in the resource to validate and predict protein sequences encoded by influenza viruses A and B. The output can be a feature table suitable for sequence submission to GenBank, a GenBank flat file, or the predicted protein sequences. This tool makes sequence submission to GenBank much easier, thus promoting data sharing among the influenza virus research community. The most common signature mutations that might confer drug resistance by the influenza virus can also be detected and reported by this tool. The Influenza Virus Resource includes links to protein structures, the Trace Archive, publications, and general information about flu viruses.

Yiming Bao Drnitry Dernovoy Boris Kiryutin Leonid Zaslavsky Tatiana Tatusova Jim Ostell David Lipman

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894,USA

国际会议

The 7th Asia-Pacific Bioinformatics Conference(第七届亚太生物信息学大会)

北京

英文

891

2009-01-01(万方平台首次上网日期,不代表论文的发表时间)