175
views
0
recommends
+1 Recommend
0 collections
    4
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      VSEARCH: a versatile open source tool for metagenomics

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          VSEARCH is an open source and free of charge multithreaded 64-bit tool for processing and preparing metagenomics, genomics and population genomics nucleotide sequence data. It is designed as an alternative to the widely used USEARCH tool ( Edgar, 2010) for which the source code is not publicly available, algorithm details are only rudimentarily described, and only a memory-confined 32-bit version is freely available for academic use.

          Methods

          When searching nucleotide sequences, VSEARCH uses a fast heuristic based on words shared by the query and target sequences in order to quickly identify similar sequences, a similar strategy is probably used in USEARCH. VSEARCH then performs optimal global sequence alignment of the query against potential target sequences, using full dynamic programming instead of the seed-and-extend heuristic used by USEARCH. Pairwise alignments are computed in parallel using vectorisation and multiple threads.

          Results

          VSEARCH includes most commands for analysing nucleotide sequences available in USEARCH version 7 and several of those available in USEARCH version 8, including searching (exact or based on global alignment), clustering by similarity (using length pre-sorting, abundance pre-sorting or a user-defined order), chimera detection (reference-based or de novo), dereplication (full length or prefix), pairwise alignment, reverse complementation, sorting, and subsampling. VSEARCH also includes commands for FASTQ file processing, i.e., format detection, filtering, read quality statistics, and merging of paired reads. Furthermore, VSEARCH extends functionality with several new commands and improvements, including shuffling, rereplication, masking of low-complexity sequences with the well-known DUST algorithm, a choice among different similarity definitions, and FASTQ file format conversion. VSEARCH is here shown to be more accurate than USEARCH when performing searching, clustering, chimera detection and subsampling, while on a par with USEARCH for paired-ends read merging. VSEARCH is slower than USEARCH when performing clustering and chimera detection, but significantly faster when performing paired-end reads merging and dereplication. VSEARCH is available at https://github.com/torognes/vsearch under either the BSD 2-clause license or the GNU General Public License version 3.0.

          Discussion

          VSEARCH has been shown to be a fast, accurate and full-fledged alternative to USEARCH. A free and open-source versatile tool for sequence analysis is now available to the metagenomics community.

          Related collections

          Most cited references15

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Swarm: robust and fast clustering method for amplicon-based studies

          Popular de novo amplicon clustering methods suffer from two fundamental flaws: arbitrary global clustering thresholds, and input-order dependency induced by centroid selection. Swarm was developed to address these issues by first clustering nearly identical amplicons iteratively using a local threshold, and then by using clusters’ internal structure and amplicon abundances to refine its results. This fast, scalable, and input-order independent approach reduces the influence of clustering parameters and produces robust operational taxonomic units.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform

            With read lengths of currently up to 2 × 300 bp, high throughput and low sequencing costs Illumina's MiSeq is becoming one of the most utilized sequencing platforms worldwide. The platform is manageable and affordable even for smaller labs. This enables quick turnaround on a broad range of applications such as targeted gene sequencing, metagenomics, small genome sequencing and clinical molecular diagnostics. However, Illumina error profiles are still poorly understood and programs are therefore not designed for the idiosyncrasies of Illumina data. A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions. Studying true genetic variation in a population sample is fundamental for understanding diseases, evolution and origin. We conducted a large study on the error patterns for the MiSeq based on 16S rRNA amplicon sequencing data. We tested state-of-the-art library preparation methods for amplicon sequencing and showed that the library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns. Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Objective Criteria for the Evaluation of Clustering Methods

                Bookmark

                Author and article information

                Contributors
                Journal
                PeerJ
                PeerJ
                peerj
                peerj
                PeerJ
                PeerJ Inc. (San Francisco, USA )
                2167-8359
                18 October 2016
                2016
                : 4
                : e2584
                Affiliations
                [1 ]Department of Informatics, University of Oslo , Oslo, Norway
                [2 ]Department of Microbiology, Oslo University Hospital , Oslo, Norway
                [3 ]Heidelberg Institute for Theoretical Studies , Heidelberg, Germany
                [4 ]Institute for Theoretical Informatics, Karlsruhe Institute of Technology , Karlsruhe, Germany
                [5 ]School of Engineering, University of Glasgow , Glasgow, United Kingdom
                [6 ]Warwick Medical School, University of Warwick , Coventry, United Kingdom
                [7 ]Department of Ecology, University of Kaiserslautern , Kaiserslautern, Germany
                [8 ]UMR LSTM, CIRAD , Montpellier, France
                Article
                2584
                10.7717/peerj.2584
                5075697
                27781170
                40b8ea39-5d63-4ee2-ba18-71fd2cde5923
                ©2016 Rognes et al.

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

                History
                : 5 September 2016
                : 17 September 2016
                Funding
                Funded by: UNINETT Sigma2
                Award ID: NN9383K
                Funded by: Unilever
                Funded by: MRC Cloud Infrastructure for Microbial Bioinformatics (CLIMB)
                Award ID: MR/L015080/1
                Award ID: MR/M50161X/1
                Funded by: Deutsche Forschungsgemeinschaft
                Award ID: #DU1319/1-1
                This research was supported in part with computational resources at the University of Oslo provided by UNINETT Sigma2 project NN9383K and funded by the Research Council of Norway. BN was funded by BBSRC CASE studentship supported by Unilever. CQ was funded through the MRC Cloud Infrastructure for Microbial Bioinformatics (CLIMB) project (MR/L015­080/1) through fellowship (MR/M50161X/1). FM was supported by the Deutsche Forschungsgemeinschaft (grant #DU1319/1-1). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Biodiversity
                Bioinformatics
                Computational Biology
                Genomics
                Microbiology

                clustering,chimera detection,searching,masking,shuffling,parallellization,metagenomics,alignment,sequences,dereplication

                Comments

                Comment on this article