Swarm: robust and fast clustering method for amplicon-based studies

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Popular de novo amplicon clustering methods suffer from two fundamental flaws: arbitrary global clustering thresholds, and input-order dependency induced by centroid selection. Swarm was developed to address these issues by first clustering nearly identical amplicons iteratively using a local threshold, and then by using clusters’ internal structure and amplicon abundances to refine its results. This fast, scalable, and input-order independent approach reduces the influence of clustering parameters and produces robust operational taxonomic units.

Related collections

Most cited references 15

Record: found
Abstract: found
Article: not found

Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample.

J. Gregory Caporaso, Christian L. Lauber, William A. Walters … (2011)

The ongoing revolution in high-throughput sequencing continues to democratize the ability of small groups of investigators to map the microbial component of the biosphere. In particular, the coevolution of new sequencing platforms and new software tools allows data acquisition and analysis on an unprecedented scale. Here we report the next stage in this coevolutionary arms race, using the Illumina GAIIx platform to sequence a diverse array of 25 environmental samples and three known "mock communities" at a depth averaging 3.1 million reads per sample. We demonstrate excellent consistency in taxonomic recovery and recapture diversity patterns that were previously reported on the basis of metaanalysis of many studies from the literature (notably, the saline/nonsaline split in environmental samples and the split between host-associated and free-living communities). We also demonstrate that 2,000 Illumina single-end reads are sufficient to recapture the same relationships among samples that we observe with the full dataset. The results thus open up the possibility of conducting large-scale studies analyzing thousands of samples simultaneously to survey microbial communities at an unprecedented spatial and temporal resolution.

0 comments Cited 3411 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Identification of common molecular subsequences.

T.F. Smith, M.S. Waterman (1981)

0 comments Cited 1724 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Book: not found

R: A Language and Environment for Statistical Computing.

Development Team, R. Team, RDCJC Team … (2009)

0 comments Cited 788 times – based on 0 reviews

Bookmark

All references

Author and article information

Contributors

Frédéric Mahé

Journal

Journal ID (nlm-ta): PeerJ

Journal ID (iso-abbrev): PeerJ

Journal ID (pmc): PeerJ

Journal ID (publisher-id): PeerJ

Title: PeerJ

Publisher: PeerJ Inc. (San Francisco, USA )

ISSN (Electronic): 2167-8359

Publication date (Electronic): 25 September 2014

Publication date Collection: 2014

Volume: 2

Electronic Location Identifier: e593

Affiliations

[1 ]CNRS, UMR 7144, EPEP – Évolution des Protistes et des Écosystèmes Pélagiques, Station Biologique de Roscoff , Roscoff, France

[2 ]Sorbonne Universités, UPMC Univ Paris 06, UMR 7144, Station Biologique de Roscoff , Roscoff, France

[3 ]Department of Ecology, University of Kaiserslautern , Kaiserslautern, Germany

[4 ]Department of Microbiology, Oslo University Hospital, Rikshospitalet , Oslo, Norway

[5 ]Department of Informatics, University of Oslo , Oslo, Norway

[6 ]School of Engineering, University of Glasgow , Glasgow, UK

Article

Publisher ID: 593

DOI: 10.7717/peerj.593

PMC ID: 4178461

PubMed ID: 25276506

SO-VID: 5318d98d-5f01-45f1-82c6-45a105a70ceb

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

History

Date received : 13 May 2014

Date accepted : 3 September 2014

Funding

Funded by: EU EraNet BiodivErsA program BioMarKs

Award ID: 2008-6530

Funded by: French government “Investissements d’Avenir” project OCEANOMICS

Award ID: ANR-11-BTBR-0008

Funded by: Deutsche Forschungsgemeinschaft

Award ID: DU1319/1-1

Funded by: EPSRC Career Acceleration Fellowship

Award ID: EP/H003851/1

FM and CdeV were supported by the EU EraNet BiodivErsA program BioMarKs (grant #2008-6530) and the French government “Investissements d’Avenir” project OCEANOMICS (ANR-11-BTBR-0008) and the EU FP7 program MicroB3 (contract number 287589). FM and MD were supported by the Deutsche Forschungsgemeinschaft (grant #DU1319/1-1). TR was supported by a Centre of Excellence grant from the Research Council of Norway to CMBN. CQ is funded by an EPSRC Career Acceleration Fellowship – EP/H003851/1. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Swarm: robust and fast clustering method for amplicon-based studies

Read this article at

Abstract

Related collections

Recursive Rule based Visual Categorization

Most cited references 15

Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample.

Identification of common molecular subsequences.

R: A Language and Environment for Statistical Computing.

Author and article information

Contributors

Journal

Affiliations

Article

History

Funding

Categories

Comments

Comment on this article

Similar content 405

Cited by 353

Most referenced authors 1,056