Investigation into the annotation of protocol sequencing steps in the sequence read archive

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

The workflow for the production of high-throughput sequencing data from nucleic acid samples is complex. There are a series of protocol steps to be followed in the preparation of samples for next-generation sequencing. The quantification of bias in a number of protocol steps, namely DNA fractionation, blunting, phosphorylation, adapter ligation and library enrichment, remains to be determined.

Results

We examined the experimental metadata of the public repository Sequence Read Archive (SRA) in order to ascertain the level of annotation of important sequencing steps in submissions to the database. Using SQL relational database queries (using the SRAdb SQLite database generated by the Bioconductor consortium) to search for keywords commonly occurring in key preparatory protocol steps partitioned over studies, we found that 7.10%, 5.84% and 7.57% of all records (fragmentation, ligation and enrichment, respectively), had at least one keyword corresponding to one of the three protocol steps. Only 4.06% of all records, partitioned over studies, had keywords for all three steps in the protocol (5.58% of all SRA records).

Conclusions

The current level of annotation in the SRA inhibits systematic studies of bias due to these protocol steps. Downstream from this, meta-analyses and comparative studies based on these data will have a source of bias that cannot be quantified at present.

Electronic supplementary material

The online version of this article (doi:10.1186/s13742-015-0064-7) contains supplementary material, which is available to authorized users.

Related collections

Most cited references 23

Record: found
Abstract: found
Article: found

Is Open Access

The Sequence Read Archive

Rasko Leinonen, Hideaki Sugawara, Martin Shumway (2010)

The combination of significantly lower cost and increased speed of sequencing has resulted in an explosive growth of data submitted into the primary next-generation sequence data archive, the Sequence Read Archive (SRA). The preservation of experimental data is an important part of the scientific record, and increasing numbers of journals and funding agencies require that next-generation sequence data are deposited into the SRA. The SRA was established as a public repository for the next-generation sequence data and is operated by the International Nucleotide Sequence Database Collaboration (INSDC). INSDC partners include the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). The SRA is accessible at http://www.ncbi.nlm.nih.gov/Traces/sra from NCBI, at http://www.ebi.ac.uk/ena from EBI and at http://trace.ddbj.nig.ac.jp from DDBJ. In this article, we present the content and structure of the SRA, detail our support for sequencing platforms and provide recommended data submission levels and formats. We also briefly outline our response to the challenge of data growth.

0 comments Cited 665 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Assembly algorithms for next-generation sequencing data.

Jason R. Miller, Sergey Koren, Granger Sutton (2010)

The emergence of next-generation sequencing platforms led to resurgence of research in whole-genome shotgun assembly algorithms and software. DNA sequencing data from the Roche 454, Illumina/Solexa, and ABI SOLiD platforms typically present shorter read lengths, higher coverage, and different error profiles compared with Sanger sequencing data. Since 2005, several assembly software packages have been created or revised specifically for de novo assembly of next-generation sequencing data. This review summarizes and compares the published descriptions of packages named SSAKE, SHARCGS, VCAKE, Newbler, Celera Assembler, Euler, Velvet, ABySS, AllPaths, and SOAPdenovo. More generally, it compares the two standard methods known as the de Bruijn graph approach and the overlap/layout/consensus approach to assembly. Copyright 2010 Elsevier Inc. All rights reserved.

0 comments Cited 421 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Biases in Illumina transcriptome sequencing caused by random hexamer priming

Kasper D Hansen, Steven Brenner, Sandrine Dudoit (2010)

Generation of cDNA using random hexamer priming induces biases in the nucleotide composition at the beginning of transcriptome sequencing reads from the Illumina Genome Analyzer. The bias is independent of organism and laboratory and impacts the uniformity of the reads along the transcriptome. We provide a read count reweighting scheme, based on the nucleotide frequencies of the reads, that mitigates the impact of the bias.

0 comments Cited 289 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Jamie Alnasir: Jamie.Al-Nasir.2013@live.rhul.ac.uk

Hugh P Shanahan: Hugh.Shanahan@rhul.ac.uk

Journal

Journal ID (nlm-ta): Gigascience

Journal ID (iso-abbrev): Gigascience

Title: GigaScience

Publisher: BioMed Central (London )

ISSN (Electronic): 2047-217X

Publication date (Electronic): 9 May 2015

Publication date PMC-release: 9 May 2015

Publication date Collection: 2015

Volume: 4

Electronic Location Identifier: 23

Affiliations

Department of Computer Science, Royal Holloway, University of London, Egham, TW20 0EX UK

Article

Publisher ID: 64

DOI: 10.1186/s13742-015-0064-7

PMC ID: 4425880

PubMed ID: 25960871

SO-VID: cb747f48-a0b9-46bb-8cf6-163ab7366632

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

History

Date received : 15 October 2014

Date accepted : 28 April 2015

Custom metadata

Keywords: next-generation sequencing,ligation,fragmentation,enrichment,protocol,metadata,experiment,annotation

Data availability:

Keywords: next-generation sequencing, ligation, fragmentation, enrichment, protocol, metadata, experiment, annotation

Investigation into the annotation of protocol sequencing steps in the sequence read archive

Read this article at

Abstract

Background

Results

Conclusions

Electronic supplementary material

Related collections

Metadata

Most cited references 23

The Sequence Read Archive

Assembly algorithms for next-generation sequencing data.

Biases in Illumina transcriptome sequencing caused by random hexamer priming

Author and article information

Contributors

Journal

Affiliations

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 44

Cited by 9

Most referenced authors 730