• Record: found
  • Abstract: found
  • Article: found
Is Open Access

From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics

Read this article at

      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


      MotivationReproducing the results from a scientific paper can be challenging due to the absence of data and the computational tools required for their analysis. In addition, details relating to the procedures used to obtain the published results can be difficult to discern due to the use of natural language when reporting how experiments have been performed. The Investigation/Study/Assay (ISA), Nanopublications (NP), and Research Objects (RO) models are conceptual data modelling frameworks that can structure such information from scientific papers. Computational workflow platforms can also be used to reproduce analyses of data in a principled manner. We assessed the extent by which ISA, NP, and RO models, together with the Galaxy workflow system, can capture the experimental processes and reproduce the findings of a previously published paper reporting on the development of SOAPdenovo2, a de novo genome assembler.ResultsExecutable workflows were developed using Galaxy, which reproduced results that were consistent with the published findings. A structured representation of the information in the SOAPdenovo2 paper was produced by combining the use of ISA, NP, and RO models. By structuring the information in the published paper using these data and scientific workflow modelling frameworks, it was possible to explicitly declare elements of experimental design, variables, and findings. The models served as guides in the curation of scientific information and this led to the identification of inconsistencies in the original published paper, thereby allowing its authors to publish corrections in the form of an errata.AvailabilitySOAPdenovo2 scripts, data, and results are available through the GigaScience Database:; the workflows are available from GigaGalaxy:; and the representations using the ISA, NP, and RO models are available through the SOAPdenovo2 case study website Contact: and

      Related collections

      Most cited references 23

      • Record: found
      • Abstract: found
      • Article: not found

      Judgment under Uncertainty: Heuristics and Biases.

       A Tversky,  D Kahneman (1974)
      This article described three heuristics that are employed in making judgements under uncertainty: (i) representativeness, which is usually employed when people are asked to judge the probability that an object or event A belongs to class or process B; (ii) availability of instances or scenarios, which is often employed when people are asked to assess the frequency of a class or the plausibility of a particular development; and (iii) adjustment from an anchor, which is usually employed in numerical prediction when a relevant value is available. These heuristics are highly economical and usually effective, but they lead to systematic and predictable errors. A better understanding of these heuristics and of the biases to which they lead could improve judgements and decisions in situations of uncertainty.
        • Record: found
        • Abstract: found
        • Article: found
        Is Open Access

        Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences

        Increased reliance on computational approaches in the life sciences has revealed grave concerns about how accessible and reproducible computation-reliant results truly are. Galaxy, an open web-based platform for genomic research, addresses these problems. Galaxy automatically tracks and manages data provenance and provides support for capturing the context and intent of computational methods. Galaxy Pages are interactive, web-based documents that provide users with a medium to communicate a complete computational analysis.
          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler

          Background There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads; however, several big challenges remain to be overcome in order for this to be efficient and accurate. SOAPdenovo has been successfully applied to assemble many published genomes, but it still needs improvement in continuity, accuracy and coverage, especially in repeat regions. Findings To overcome these challenges, we have developed its successor, SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome. Conclusions Benchmark using the Assemblathon1 and GAGE datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive to other assemblers on both assembly length and accuracy. We also provide an updated assembly version of the 2008 Asian (YH) genome using SOAPdenovo2. Here, the contig and scaffold N50 of the YH genome were ~20.9 kbp and ~22 Mbp, respectively, which is 3-fold and 50-fold longer than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 lower during the point of largest memory consumption.

            Author and article information

            [1 ]Oxford e-Research Centre, University of Oxford, 7 Keble Road, OX1 3QG, United Kingdom
            [2 ]GigaScience, BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong, People’s Republic of China
            [3 ]InfoLab21, Lancaster University, Bailrigg, Lancaster, LA1 4WA, United Kingdom
            [4 ]Nuffield Department of Medicine, Experimental Medicine Division, John Radcliffe Hospital, Headley Way, Headington, Oxford, OX3 9DU, United Kingdom
            [5 ]Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
            [6 ]HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Pokfulam, Hong Kong, People’s Republic of China
            [7 ]School of Biomedical Sciences and CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong, Shatin, Hong Kong, People’s Republic of China
            University of Illinois-Chicago, UNITED STATES
            Author notes

            Competing Interests: The authors have declared that no competing interests exist.

            Conceived and designed the experiments: SAS AGB PRS JZ MR. Performed the experiments: AGB PRS PL. Analyzed the data: AGB PRS PL. Contributed reagents/materials/analysis tools: RL TWL TLL PL. Wrote the paper: PRS AGB SAS PL JZ MR SCE MSAG. Proposed the idea after an initial meeting with JZ, MR, AGB, and PRS: SAS. Selected the publication and worked with its authors (RL, TWL): SCE PL. Did ISA-Tab, linkedISA RDF, NPs representation and SPARQL queries over linkedISA and NPs: PRS AGB. Reviewed the NPs: MR MT. Submitted terms to OBI: PRS. Wrote linkedISA, NanoMaton software and prepared dedicated website and triple store: AGB. Re-implemented the published SOAPdenovo2 analyses as Galaxy workflows with help from SCE, RL, and TWL: PL TLL. Created the Research Object with input from MSAG and PRS: JZ. Wrote the manuscript first draft: PRS. Contributed to the final version, read it, and approved it: PRS AGB PL JZ MSAG MR MT EH RK RL TLL TWL SCE SAS. Contributed to the review of the nanopublications produced by PRS and AGB: EH RK.

            Role: Editor
            PLoS One
            PLoS ONE
            PLoS ONE
            Public Library of Science (San Francisco, CA USA )
            8 July 2015
            : 10
            : 7

            This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

            Figures: 3, Tables: 2, Pages: 20
            SAS, PRS, and AGB received funding from the European Union Coordination of Standards in Metabolomics (COSMOS) FP7 E9RXDC00, the British Biotechnology and Biological Science Research Council BB/L024101/1, BB/I025840/1, and the University of Oxford e-Research Centre. The work done by PL, SCE, and TLL on GigaGalaxy and the implementation of the SOAPdenovo2 workflows were supported by funding from the joint Chinese University of Hong Kong (CUHK)/The Beijing Genome Institute (BGI) Innovation Institute of Trans-omics and School of Biomedical Sciences, The Chinese University of Hong Kong (CUHK) and the China National GeneBank (CNGB). MSAG and JZ are supported by the European Union Workflow4ever project (EU Wf4Ever STREP, 270129), funded under European Union Framework Program 7 (EU-FP7 ICT-2009.4.1). MR, MT, EH, and RK are supported by the European Union Workflow4ever project (EU Wf4Ever STREP, 270129) funded under European Union Framework Program 7 (EU-FP7 ICT-2009.4.1), the Innovative Medicines Initiative Joint Undertaking (IMI-JU) project Open PHACTS (grant agreement no. 115191), and the European Union RD-Connect (EU FP7/2007-2013, grant agreement no. 305,444). SOAPdenovo2 was developed with the support of the State Key Development Program for Basic Research of China-973 Program (2011CB809203); National High Technology Research and Development Program of China-863 program (2012AA02A201); the National Natural Science Foundation of China (90612019); the Shenzhen Key Laboratory of Trans-omics Biotechnologies (CXB201108250096A); and the Shenzhen Municipal Government of China (JC201005260191A and CXB201108250096A). Tak-Wah Lam was partially supported by RGC General Research Fund 10612042.
            Research Article
            Custom metadata
            All data are available from github (, github code repository (, github/zenodo (, and GigaScience’s GigaDB (
            ScienceOpen disciplines:


            Comment on this article