Blog
About

  • Record: found
  • Abstract: found
  • Article: found
Is Open Access

VSEARCH: a versatile open source tool for metagenomics

Read this article at

Bookmark
      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

      Abstract

      Background

      VSEARCH is an open source and free of charge multithreaded 64-bit tool for processing and preparing metagenomics, genomics and population genomics nucleotide sequence data. It is designed as an alternative to the widely used USEARCH tool ( Edgar, 2010) for which the source code is not publicly available, algorithm details are only rudimentarily described, and only a memory-confined 32-bit version is freely available for academic use.

      Methods

      When searching nucleotide sequences, VSEARCH uses a fast heuristic based on words shared by the query and target sequences in order to quickly identify similar sequences, a similar strategy is probably used in USEARCH. VSEARCH then performs optimal global sequence alignment of the query against potential target sequences, using full dynamic programming instead of the seed-and-extend heuristic used by USEARCH. Pairwise alignments are computed in parallel using vectorisation and multiple threads.

      Results

      VSEARCH includes most commands for analysing nucleotide sequences available in USEARCH version 7 and several of those available in USEARCH version 8, including searching (exact or based on global alignment), clustering by similarity (using length pre-sorting, abundance pre-sorting or a user-defined order), chimera detection (reference-based or de novo), dereplication (full length or prefix), pairwise alignment, reverse complementation, sorting, and subsampling. VSEARCH also includes commands for FASTQ file processing, i.e., format detection, filtering, read quality statistics, and merging of paired reads. Furthermore, VSEARCH extends functionality with several new commands and improvements, including shuffling, rereplication, masking of low-complexity sequences with the well-known DUST algorithm, a choice among different similarity definitions, and FASTQ file format conversion. VSEARCH is here shown to be more accurate than USEARCH when performing searching, clustering, chimera detection and subsampling, while on a par with USEARCH for paired-ends read merging. VSEARCH is slower than USEARCH when performing clustering and chimera detection, but significantly faster when performing paired-end reads merging and dereplication. VSEARCH is available at https://github.com/torognes/vsearch under either the BSD 2-clause license or the GNU General Public License version 3.0.

      Discussion

      VSEARCH has been shown to be a fast, accurate and full-fledged alternative to USEARCH. A free and open-source versatile tool for sequence analysis is now available to the metagenomics community.

      Related collections

      Most cited references 36

      • Record: found
      • Abstract: not found
      • Article: not found

      Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

      The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.
        Bookmark
        • Record: found
        • Abstract: found
        • Article: found
        Is Open Access

        Fast and accurate short read alignment with Burrows–Wheeler transform

        Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ∼10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: rd@sanger.ac.uk
          Bookmark
          • Record: found
          • Abstract: not found
          • Article: not found

          QIIME allows analysis of high-throughput community sequencing data.

            Bookmark

            Author and article information

            Affiliations
            [1 ]Department of Informatics, University of Oslo , Oslo, Norway
            [2 ]Department of Microbiology, Oslo University Hospital , Oslo, Norway
            [3 ]Heidelberg Institute for Theoretical Studies , Heidelberg, Germany
            [4 ]Institute for Theoretical Informatics, Karlsruhe Institute of Technology , Karlsruhe, Germany
            [5 ]School of Engineering, University of Glasgow , Glasgow, United Kingdom
            [6 ]Warwick Medical School, University of Warwick , Coventry, United Kingdom
            [7 ]Department of Ecology, University of Kaiserslautern , Kaiserslautern, Germany
            [8 ]UMR LSTM, CIRAD , Montpellier, France
            Contributors
            Journal
            PeerJ
            PeerJ
            peerj
            peerj
            PeerJ
            PeerJ Inc. (San Francisco, USA )
            2167-8359
            18 October 2016
            2016
            : 4
            5075697 2584 10.7717/peerj.2584
            ©2016 Rognes et al.

            This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

            Funding
            Funded by: UNINETT Sigma2
            Award ID: NN9383K
            Funded by: Unilever
            Funded by: MRC Cloud Infrastructure for Microbial Bioinformatics (CLIMB)
            Award ID: MR/L015080/1
            Award ID: MR/M50161X/1
            Funded by: Deutsche Forschungsgemeinschaft
            Award ID: #DU1319/1-1
            This research was supported in part with computational resources at the University of Oslo provided by UNINETT Sigma2 project NN9383K and funded by the Research Council of Norway. BN was funded by BBSRC CASE studentship supported by Unilever. CQ was funded through the MRC Cloud Infrastructure for Microbial Bioinformatics (CLIMB) project (MR/L015­080/1) through fellowship (MR/M50161X/1). FM was supported by the Deutsche Forschungsgemeinschaft (grant #DU1319/1-1). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
            Categories
            Biodiversity
            Bioinformatics
            Computational Biology
            Genomics
            Microbiology

            Comments

            Comment on this article