Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Analyzing high throughput genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation. A computational platform enabling best practice genomics analysis ideally meets a number of requirements, including: a wide range of analysis and visualisation tools, closely linked to large user and reference data sets; workflow platform(s) enabling accessible, reproducible, portable analyses, through a flexible set of interfaces; highly available, scalable computational resources; and flexibility and versatility in the use of these resources to meet demands and expertise of a variety of users. Access to an appropriate computational platform can be a significant barrier to researchers, as establishing such a platform requires a large upfront investment in hardware, experience, and expertise.

Results

We designed and implemented the Genomics Virtual Laboratory (GVL) as a middleware layer of machine images, cloud management tools, and online services that enable researchers to build arbitrarily sized compute clusters on demand, pre-populated with fully configured bioinformatics tools, reference datasets and workflow and visualisation options. The platform is flexible in that users can conduct analyses through web-based (Galaxy, RStudio, IPython Notebook) or command-line interfaces, and add/remove compute nodes and data resources as required. Best-practice tutorials and protocols provide a path from introductory training to practice. The GVL is available on the OpenStack-based Australian Research Cloud ( http://nectar.org.au) and the Amazon Web Services cloud. The principles, implementation and build process are designed to be cloud-agnostic.

Conclusions

This paper provides a blueprint for the design and implementation of a cloud-based Genomics Virtual Laboratory. We discuss scope, design considerations and technical and logistical constraints, and explore the value added to the research community through the suite of services and resources provided by our implementation.

Related collections

Most cited references 21

Record: found
Abstract: found
Article: found

Is Open Access

Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences

Jeremy Goecks, Anton Nekrutenko, James E. Taylor (2010)

Increased reliance on computational approaches in the life sciences has revealed grave concerns about how accessible and reproducible computation-reliant results truly are. Galaxy http://usegalaxy.org, an open web-based platform for genomic research, addresses these problems. Galaxy automatically tracks and manages data provenance and provides support for capturing the context and intent of computational methods. Galaxy Pages are interactive, web-based documents that provide users with a medium to communicate a complete computational analysis.

0 comments Cited 1419 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The generic genome browser: a building block for a model organism system database.

Lincoln D. Stein, Christopher John Mungall, ShengQiang Shu … (2002)

The Generic Model Organism System Database Project (GMOD) seeks to develop reusable software components for model organism system databases. In this paper we describe the Generic Genome Browser (GBrowse), a Web-based application for displaying genomic annotations and other features. For the end user, features of the browser include the ability to scroll and zoom through arbitrary regions of a genome, to enter a region of the genome by searching for a landmark or performing a full text search of all features, and the ability to enable and disable tracks and change their relative order and appearance. The user can upload private annotations to view them in the context of the public ones, and publish those annotations to the community. For the data provider, features of the browser software include reliance on readily available open source components, simple installation, flexible configuration, and easy integration with other components of a model organism system Web site. GBrowse is freely available under an open source license. The software, its documentation, and support are available at http://www.gmod.org.

0 comments Cited 536 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data

Franck Rapaport, Raya Khanin, Yupu Liang … (2013)

A large number of computational methods have been developed for analyzing differential gene expression in RNA-seq data. We describe a comprehensive evaluation of common methods using the SEQC benchmark dataset and ENCODE data. We consider a number of key features, including normalization, accuracy of differential expression detection and differential expression analysis when one condition has no detectable expression. We find significant differences among the methods, but note that array-based methods adapted to RNA-seq data perform comparably to methods designed for RNA-seq. Our results demonstrate that increasing the number of replicate samples significantly improves detection power over increased sequencing depth.

0 comments Cited 319 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Christophe Antoniewski: Role: Editor

Journal

Journal ID (nlm-ta): PLoS One

Journal ID (iso-abbrev): PLoS ONE

Journal ID (publisher-id): plos

Journal ID (pmc): plosone

Title: PLoS ONE

Publisher: Public Library of Science (San Francisco, CA USA )

ISSN (Electronic): 1932-6203

Publication date (Electronic): 26 October 2015

Publication date Collection: 2015

Volume: 10

Issue: 10

Electronic Location Identifier: e0140829

Affiliations

[1 ]Victorian Life Sciences Computation Initiative (VLSCI), University of Melbourne, Melbourne, Victoria, Australia

[2 ]Department of Biology, Johns Hopkins University, Baltimore, Maryland, United States of America

[3 ]Centre for Computing and Informatics (CIR), Rudjer Boskovic Institute (RBI), Zagreb, Croatia

[4 ]Research Computing Centre, University of Queensland, Brisbane, Queensland, Australia

[5 ]Queensland Facility for Advanced Bioinformatics (QFAB), University of Queensland, Brisbane, Queensland, Australia

CNRS UMR7622 & University Paris 6 Pierre-et-Marie-Curie, FRANCE

Author notes

Competing Interests: The authors have declared that no competing interests exist.

Conceived and designed the experiments: EA CS NG IM SG MP AL. Performed the experiments: EA CS NG IM DB MC SG YK MP RH AL. Analyzed the data: EA CS NG IM DB MC SG YK MP RH AL. Contributed reagents/materials/analysis tools: EA CS NG IM DB SG YK MP. Wrote the paper: EA CS NG AL.

* E-mail: alonie@ 123456unimelb.edu.au

Article

Publisher ID: PONE-D-15-22212

DOI: 10.1371/journal.pone.0140829

PMC ID: 4621043

PubMed ID: 26501966

SO-VID: e10911a7-1246-442c-b0eb-dd2301c99beb

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

History

Date received : 21 May 2015

Date accepted : 29 September 2015

Page count

Figures: 5, Tables: 2, Pages: 20

Funding

This work was supported by a grant from The National eResearch Collaboration Tools and Resources project (NeCTAR; http://nectar.org.au). NeCTAR is an Australian Government Super Science project, financed by the Education Investment Fund. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript..

Custom metadata

Data Availability All relevant data are within the paper.

ScienceOpen disciplines: Uncategorized

Data availability:

ScienceOpen disciplines: Uncategorized

Comments

Comment on this article

scite_

Cited by 51

See all cited by

Most referenced authors 735

See all reference authors

Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud

Read this article at

Abstract

Background

Results

Conclusions

Related collections

The conundrum of Virtual Research Environments, Science Gateways, Virtual Laboratories, Collaboratories

Most cited references 21

Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences

The generic genome browser: a building block for a model organism system database.

Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 121

Cited by 51

Most referenced authors 735