PANDAseq: paired-end assembler for illumina sequences

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Illumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher sequence yields using quality information.

Results

PANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. Benchmarks were done using real error masks on simulated data, a pure source template, and a pooled template of genomic DNA from known organisms. PANDAseq assembled reads more rapidly and with reduced error incorporation compared to alternative methods.

Conclusions

PANDAseq rapidly assembles sequences and scales to billions of paired-end reads. Assembly of control libraries showed a 4-50% increase in the number of assembled sequences over naïve assembly with negligible loss of "good" sequence.

Related collections

Most cited references 4

Record: found
Abstract: found
Article: not found

Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample.

J. Gregory Caporaso, Christian L. Lauber, William A. Walters … (2011)

The ongoing revolution in high-throughput sequencing continues to democratize the ability of small groups of investigators to map the microbial component of the biosphere. In particular, the coevolution of new sequencing platforms and new software tools allows data acquisition and analysis on an unprecedented scale. Here we report the next stage in this coevolutionary arms race, using the Illumina GAIIx platform to sequence a diverse array of 25 environmental samples and three known "mock communities" at a depth averaging 3.1 million reads per sample. We demonstrate excellent consistency in taxonomic recovery and recapture diversity patterns that were previously reported on the basis of metaanalysis of many studies from the literature (notably, the saline/nonsaline split in environmental samples and the split between host-associated and free-living communities). We also demonstrate that 2,000 Illumina single-end reads are sufficient to recapture the same relationships among samples that we observe with the full dataset. The results thus open up the possibility of conducting large-scale studies analyzing thousands of samples simultaneously to survey microbial communities at an unprecedented spatial and temporal resolution.

0 comments Cited 3245 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data

J. R. Cole, B. Chai, R. J. Farris … (2006)

Substantial new features have been implemented at the Ribosomal Database Project in response to the increased importance of high-throughput rRNA sequence analysis in microbial ecology and related disciplines. The most important changes include quality analysis, including chimera detection, for all available rRNA sequences and the introduction of myRDP Space, a new web component designed to help researchers place their own data in context with the RDP's data. In addition, new video tutorials describe how to use RDP features. Details about RDP data and analytical functions can be found at the RDP-II website ().

0 comments Cited 276 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Microbiome Profiling by Illumina Sequencing of Combinatorial Sequence-Tagged PCR Products

Gregory Gloor, Ruben Hummelen, Jean M Macklaim … (2010)

We developed a low-cost, high-throughput microbiome profiling method that uses combinatorial sequence tags attached to PCR primers that amplify the rRNA V6 region. Amplified PCR products are sequenced using an Illumina paired-end protocol to generate millions of overlapping reads. Combinatorial sequence tagging can be used to examine hundreds of samples with far fewer primers than is required when sequence tags are incorporated at only a single end. The number of reads generated permitted saturating or near-saturating analysis of samples of the vaginal microbiome. The large number of reads allowed an in-depth analysis of errors, and we found that PCR-induced errors composed the vast majority of non-organism derived species variants, an observation that has significant implications for sequence clustering of similar high-throughput data. We show that the short reads are sufficient to assign organisms to the genus or species level in most cases. We suggest that this method will be useful for the deep sequencing of any short nucleotide region that is taxonomically informative; these include the V3, V5 regions of the bacterial 16S rRNA genes and the eukaryotic V9 region that is gaining popularity for sampling protist diversity.

0 comments Cited 122 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Journal ID (iso-abbrev): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central

ISSN (Electronic): 1471-2105

Publication date Collection: 2012

Publication date (Electronic): 14 February 2012

Volume: 13

Page: 31

Affiliations

[1 ]Department of Biology, University of Waterloo, Waterloo, Ontario, Canada

[2 ]David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada

Article

Publisher ID: 1471-2105-13-31

DOI: 10.1186/1471-2105-13-31

PMC ID: 3471323

PubMed ID: 22333067

SO-VID: a16c3745-ac44-4dab-a10e-8ad648d49832

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PANDAseq: paired-end assembler for illumina sequences

Read this article at

Abstract

Background

Results

Conclusions

Related collections

Genetoberfest

Most cited references 4

Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample.

The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data

Microbiome Profiling by Illumina Sequencing of Combinatorial Sequence-Tagged PCR Products

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 115

Cited by 921

Most referenced authors 431