Microbase2.0: A Generic Framework for Computationally Intensive Bioinformatics Workflows in the Cloud

摘要：

As bioinformatics datasets grow ever larger, and analyses become increasingly complex, there is a need for data handling infrastructures to keep pace with developing technology. One solution is to apply Grid and Cloud technologies to address the computational requirements of analysing high throughput datasets. We present an approach for writing new, or wrapping existing applications, and a reference implementation of a framework, Microbase2.0, for executing those applications using Grid and Cloud technologies. We used Microbase2.0 to develop an automated Cloud-based bioinformatics workflow executing simultaneously on two different Amazon EC2 data centres and the Newcastle University Condor Grid. Several CPU years worth of computational work was performed by this system in less than two months. The workflow produced a detailed dataset characterising the cellular localisation of 3,021,490 proteins from 867 taxa, including bacteria, archaea and unicellular eukaryotes. Microbase2.0 is freely available from http://www.microbase.org.uk/.

作者: Keith Flanagan Sirintra Nakjang Jennifer Hallinan Colin Harwood Robert P.Hirt Matthew R. Pocock Anil Wipat

作者单位: School of Computing Science, Newcastle University, Newcastle upon Tyne, NE7 4RU, UK School of Computing Science, Newcastle University, Newcastle upon Tyne, NE7 4RU, UK Institute for C Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne, NE7 4RU, UK

会议类型: 国际会议

会议名称: International Symposium on Integrative Bioinformatics the 8th Annual Meeting(2012年整合生物信息学国际会议暨第八届年度会议 IB 2012)

会议地点: 杭州

会议语种:英文

页码: 4-16

在线出版日期: 2012-04-02（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Microbase2.0: A Generic Framework for Computationally Intensive Bioinformatics Workflows in the Cloud