Ten simple rules for making research software more robust

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Software produced for research, published and otherwise, suffers from a number of common problems that make it difficult or impossible to run outside the original institution or even off the primary developer’s computer. We present ten simple rules to make such software robust enough to be run by anyone, anywhere, and thereby delight your users and collaborators.

Author summary

Many researchers have found out the hard way that there’s a world of difference between “works for me on my machine” and “works for other people on theirs.” Many common challenges can be avoided by following a few simple rules; doing so not only improves reproducibility but can accelerate research.

Related collections

Most cited references 17

Record: found
Abstract: found
Article: not found

Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown

Mihaela Pertea, Daehwan Kim, Geo M Pertea … (2017)

High-throughput sequencing of messenger RNA (RNA-seq) has become the standard method for measuring and comparing the levels of gene expression in a wide variety of species and conditions. RNA-seq experiments generate very large, complex data sets that demand fast, accurate, and flexible software to reduce the raw read data to comprehensible results. HISAT, StringTie, and Ballgown are free, open-source software tools for comprehensive analysis of RNA-seq experiments. Together, they allow scientists to align reads to a genome, assemble transcripts including novel splice variants, compute the abundance of these transcripts in each sample, and compare experiments to identify differentially expressed genes and transcripts. This protocol describes all the steps necessary to process a large set of raw sequencing reads and create lists of gene transcripts, expression levels, and differentially expressed genes and transcripts. The protocol’s execution time depends on the computing resources, but typically takes under 45 minutes of computer time. Pertea et al. describe a protocol to analyze RNA-seq data using HISAT, StringTie, and Ballgown (the “new Tuxedo” package). The protocol can be used for assembly of transcripts, quantification of gene expression levels and differential expression analysis.

0 comments Cited 1797 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The Bioperl toolkit: Perl modules for the life sciences.

Jason E Stajich, David Block, Kris Boulez … (2002)

The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive library of Perl modules available for managing and manipulating life-science information. Bioperl provides an easy-to-use, stable, and consistent programming interface for bioinformatics application programmers. The Bioperl modules have been successfully and repeatedly used to reduce otherwise complex tasks to only a few lines of code. The Bioperl object model has been proven to be flexible enough to support enterprise-level applications such as EnsEMBL, while maintaining an easy learning curve for novice Perl programmers. Bioperl is capable of executing analyses and processing results from programs such as BLAST, ClustalW, or the EMBOSS suite. Interoperation with modules written in Python and Java is supported through the evolving BioCORBA bridge. Bioperl provides access to data stores such as GenBank and SwissProt via a flexible series of sequence input/output modules, and to the emerging common sequence data storage format of the Open Bioinformatics Database Access project. This study describes the overall architecture of the toolkit, the problem domains that it addresses, and gives specific examples of how the toolkit can be used to solve common life-sciences problems. We conclude with a discussion of how the open-source nature of the project has contributed to the development effort.

0 comments Cited 714 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Best Practices for Scientific Computing

Greg Wilson, D. A. Aruliah, C. Titus Brown … (2014)

Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently. As a result, many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. We describe a set of best practices for scientific software development that have solid foundations in research and experience, and that improve scientists' productivity and the reliability of their software.

0 comments Cited 76 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): PLoS Comput Biol

Journal ID (iso-abbrev): PLoS Comput. Biol

Journal ID (publisher-id): plos

Journal ID (pmc): ploscomp

Title: PLoS Computational Biology

Publisher: Public Library of Science (San Francisco, CA USA )

ISSN (Print): 1553-734X

ISSN (Electronic): 1553-7358

Publication date Collection: April 2017

Publication date (Electronic): 13 April 2017

Volume: 13

Issue: 4

Electronic Location Identifier: e1005412

Affiliations

[1 ]Genome Sequence Informatics, Ontario Institute for Cancer Research, Toronto, Ontario, Canada

[2 ]Software Carpentry Foundation, Austin, Texas, United States of America

Author notes

The authors have declared that no competing interests exist.

* E-mail: morgan.taschuk@ 123456oicr.on.ca

Author information

Morgan Taschuk http://orcid.org/0000-0003-0677-6902

Greg Wilson http://orcid.org/0000-0001-8659-8979

Article

Publisher ID: PCOMPBIOL-D-16-01683

DOI: 10.1371/journal.pcbi.1005412

PMC ID: 5390961

PubMed ID: 28407023

SO-VID: a5db6498-b8f5-4e28-8f5b-591a361273a9

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Page count

Figures: 0, Tables: 0, Pages: 10

Funding

This work was partially funded by the Ontario Institute for Cancer Research. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Ten simple rules for making research software more robust

Read this article at

Abstract

Author summary

Related collections

Research Paper of the Future and the Reproducible Research Compendium

Most cited references 17

Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown

The Bioperl toolkit: Perl modules for the life sciences.

Best Practices for Scientific Computing

Author and article information

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Comments

Comment on this article

Similar content 72

Cited by 37

Most referenced authors 322