From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Motivation

Reproducing the results from a scientific paper can be challenging due to the absence of data and the computational tools required for their analysis. In addition, details relating to the procedures used to obtain the published results can be difficult to discern due to the use of natural language when reporting how experiments have been performed. The Investigation/Study/Assay (ISA), Nanopublications (NP), and Research Objects (RO) models are conceptual data modelling frameworks that can structure such information from scientific papers. Computational workflow platforms can also be used to reproduce analyses of data in a principled manner. We assessed the extent by which ISA, NP, and RO models, together with the Galaxy workflow system, can capture the experimental processes and reproduce the findings of a previously published paper reporting on the development of SOAPdenovo2, a de novo genome assembler.

Results

Executable workflows were developed using Galaxy, which reproduced results that were consistent with the published findings. A structured representation of the information in the SOAPdenovo2 paper was produced by combining the use of ISA, NP, and RO models. By structuring the information in the published paper using these data and scientific workflow modelling frameworks, it was possible to explicitly declare elements of experimental design, variables, and findings. The models served as guides in the curation of scientific information and this led to the identification of inconsistencies in the original published paper, thereby allowing its authors to publish corrections in the form of an errata.

Availability

SOAPdenovo2 scripts, data, and results are available through the GigaScience Database: http://dx.doi.org/10.5524/100044; the workflows are available from GigaGalaxy: http://galaxy.cbiit.cuhk.edu.hk; and the representations using the ISA, NP, and RO models are available through the SOAPdenovo2 case study website http://isa-tools.github.io/soapdenovo2/. Contact: philippe.rocca-serra@ 123456oerc.ox.ac.uk and susanna-assunta.sansone@ 123456oerc.ox.ac.uk.

Related collections

Most cited references 27

Record: found
Abstract: found
Article: found

Is Open Access

Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences

Jeremy Goecks, Anton Nekrutenko, James E. Taylor (2010)

Increased reliance on computational approaches in the life sciences has revealed grave concerns about how accessible and reproducible computation-reliant results truly are. Galaxy http://usegalaxy.org, an open web-based platform for genomic research, addresses these problems. Galaxy automatically tracks and manages data provenance and provides support for capturing the context and intent of computational methods. Galaxy Pages are interactive, web-based documents that provide users with a medium to communicate a complete computational analysis.

0 comments Cited 1410 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Reproducible research in computational science.

Roger Peng (2011)

Computational science has led to exciting new developments, but the nature of the work has exposed limitations in our ability to evaluate published findings. Reproducibility has the potential to serve as a minimum standard for judging scientific claims when full independent replication of a study is not possible.

0 comments Cited 378 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

MetaboLights—an open-access general-purpose repository for metabolomics studies and associated meta-data

Kenneth Haug, Reza Salek, Pablo Conesa Zamora … (2012)

MetaboLights (http://www.ebi.ac.uk/metabolights) is the first general-purpose, open-access repository for metabolomics studies, their raw experimental data and associated metadata, maintained by one of the major open-access data providers in molecular biology. Metabolomic profiling is an important tool for research into biological functioning and into the systemic perturbations caused by diseases, diet and the environment. The effectiveness of such methods depends on the availability of public open data across a broad range of experimental methods and conditions. The MetaboLights repository, powered by the open source ISA framework, is cross-species and cross-technique. It will cover metabolite structures and their reference spectra as well as their biological roles, locations, concentrations and raw data from metabolic experiments. Studies automatically receive a stable unique accession number that can be used as a publication reference (e.g. MTBLS1). At present, the repository includes 15 submitted studies, encompassing 93 protocols for 714 assays, and span over 8 different species including human, Caenorhabditis elegans, Mus musculus and Arabidopsis thaliana. Eight hundred twenty-seven of the metabolites identified in these studies have been mapped to ChEBI. These studies cover a variety of techniques, including NMR spectroscopy and mass spectrometry.

0 comments Cited 302 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Neil R. Smalheiser: Role: Editor

Journal

Journal ID (nlm-ta): PLoS One

Journal ID (iso-abbrev): PLoS ONE

Journal ID (publisher-id): plos

Journal ID (pmc): plosone

Title: PLoS ONE

Publisher: Public Library of Science (San Francisco, CA USA )

ISSN (Electronic): 1932-6203

Publication date Collection: 2015

Publication date (Electronic): 8 July 2015

Volume: 10

Issue: 7

Electronic Location Identifier: e0127612

Affiliations

[1 ]Oxford e-Research Centre, University of Oxford, 7 Keble Road, OX1 3QG, United Kingdom

[2 ]GigaScience, BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong, People’s Republic of China

[3 ]InfoLab21, Lancaster University, Bailrigg, Lancaster, LA1 4WA, United Kingdom

[4 ]Nuffield Department of Medicine, Experimental Medicine Division, John Radcliffe Hospital, Headley Way, Headington, Oxford, OX3 9DU, United Kingdom

[5 ]Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands

[6 ]HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Pokfulam, Hong Kong, People’s Republic of China

[7 ]School of Biomedical Sciences and CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong, Shatin, Hong Kong, People’s Republic of China

University of Illinois-Chicago, UNITED STATES

Author notes

Competing Interests: The authors have declared that no competing interests exist.

Conceived and designed the experiments: SAS AGB PRS JZ MR. Performed the experiments: AGB PRS PL. Analyzed the data: AGB PRS PL. Contributed reagents/materials/analysis tools: RL TWL TLL PL. Wrote the paper: PRS AGB SAS PL JZ MR SCE MSAG. Proposed the idea after an initial meeting with JZ, MR, AGB, and PRS: SAS. Selected the publication and worked with its authors (RL, TWL): SCE PL. Did ISA-Tab, linkedISA RDF, NPs representation and SPARQL queries over linkedISA and NPs: PRS AGB. Reviewed the NPs: MR MT. Submitted terms to OBI: PRS. Wrote linkedISA, NanoMaton software and prepared dedicated website and triple store: AGB. Re-implemented the published SOAPdenovo2 analyses as Galaxy workflows with help from SCE, RL, and TWL: PL TLL. Created the Research Object with input from MSAG and PRS: JZ. Wrote the manuscript first draft: PRS. Contributed to the final version, read it, and approved it: PRS AGB PL JZ MSAG MR MT EH RK RL TLL TWL SCE SAS. Contributed to the review of the nanopublications produced by PRS and AGB: EH RK.

* E-mail: susanna.assunta-sansone@ 123456oerc.ox.ac.uk (SS) & philippe.rocca-serra@ 123456oerc.ox.ac.uk (PR)

Article

Publisher ID: PONE-D-14-54465

DOI: 10.1371/journal.pone.0127612

PMC ID: 4495984

PubMed ID: 26154165

SO-VID: f688505d-7fd5-4eff-a37c-2cec4455273c

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

History

Date received : 4 December 2014

Date accepted : 16 April 2015

Page count

Figures: 3, Tables: 2, Pages: 20

Funding

SAS, PRS, and AGB received funding from the European Union Coordination of Standards in Metabolomics (COSMOS) FP7 E9RXDC00, the British Biotechnology and Biological Science Research Council BB/L024101/1, BB/I025840/1, and the University of Oxford e-Research Centre. The work done by PL, SCE, and TLL on GigaGalaxy and the implementation of the SOAPdenovo2 workflows were supported by funding from the joint Chinese University of Hong Kong (CUHK)/The Beijing Genome Institute (BGI) Innovation Institute of Trans-omics and School of Biomedical Sciences, The Chinese University of Hong Kong (CUHK) and the China National GeneBank (CNGB). MSAG and JZ are supported by the European Union Workflow4ever project (EU Wf4Ever STREP, 270129), funded under European Union Framework Program 7 (EU-FP7 ICT-2009.4.1). MR, MT, EH, and RK are supported by the European Union Workflow4ever project (EU Wf4Ever STREP, 270129) funded under European Union Framework Program 7 (EU-FP7 ICT-2009.4.1), the Innovative Medicines Initiative Joint Undertaking (IMI-JU) project Open PHACTS (grant agreement no. 115191), and the European Union RD-Connect (EU FP7/2007-2013, grant agreement no. 305,444). SOAPdenovo2 was developed with the support of the State Key Development Program for Basic Research of China-973 Program (2011CB809203); National High Technology Research and Development Program of China-863 program (2012AA02A201); the National Natural Science Foundation of China (90612019); the Shenzhen Key Laboratory of Trans-omics Biotechnologies (CXB201108250096A); and the Shenzhen Municipal Government of China (JC201005260191A and CXB201108250096A). Tak-Wah Lam was partially supported by RGC General Research Fund 10612042.

Custom metadata

Data Availability All data are available from github ( http://isa-tools.github.io/soapdenovo2/), github code repository ( https://github.com/ISA-tools/soapdenovo2), github/zenodo ( http://dx.doi.org/10.5281/zenodo.18403), and GigaScience’s GigaDB ( http://dx.doi.org/10.5524/100148).

ScienceOpen disciplines: Uncategorized

Data availability:

ScienceOpen disciplines: Uncategorized

Comments

Comment on this article

scite_

Cited by 11

See all cited by

Most referenced authors 1,000

See all reference authors

- Version 1

From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics

Read this article at

Abstract

Motivation

Results

Availability

Related collections

Open Research, Open Science, Open Scholarship

Most cited references 27

Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences

Reproducible research in computational science.

MetaboLights—an open-access general-purpose repository for metabolomics studies and associated meta-data

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 252

Cited by 11

Most referenced authors 1,000