Blog
About

80
views
0
recommends
+1 Recommend
0 collections
    4
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      VSEARCH: a versatile open source tool for metagenomics

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          VSEARCH is an open source and free of charge multithreaded 64-bit tool for processing and preparing metagenomics, genomics and population genomics nucleotide sequence data. It is designed as an alternative to the widely used USEARCH tool ( Edgar, 2010) for which the source code is not publicly available, algorithm details are only rudimentarily described, and only a memory-confined 32-bit version is freely available for academic use.

          Methods

          When searching nucleotide sequences, VSEARCH uses a fast heuristic based on words shared by the query and target sequences in order to quickly identify similar sequences, a similar strategy is probably used in USEARCH. VSEARCH then performs optimal global sequence alignment of the query against potential target sequences, using full dynamic programming instead of the seed-and-extend heuristic used by USEARCH. Pairwise alignments are computed in parallel using vectorisation and multiple threads.

          Results

          VSEARCH includes most commands for analysing nucleotide sequences available in USEARCH version 7 and several of those available in USEARCH version 8, including searching (exact or based on global alignment), clustering by similarity (using length pre-sorting, abundance pre-sorting or a user-defined order), chimera detection (reference-based or de novo), dereplication (full length or prefix), pairwise alignment, reverse complementation, sorting, and subsampling. VSEARCH also includes commands for FASTQ file processing, i.e., format detection, filtering, read quality statistics, and merging of paired reads. Furthermore, VSEARCH extends functionality with several new commands and improvements, including shuffling, rereplication, masking of low-complexity sequences with the well-known DUST algorithm, a choice among different similarity definitions, and FASTQ file format conversion. VSEARCH is here shown to be more accurate than USEARCH when performing searching, clustering, chimera detection and subsampling, while on a par with USEARCH for paired-ends read merging. VSEARCH is slower than USEARCH when performing clustering and chimera detection, but significantly faster when performing paired-end reads merging and dereplication. VSEARCH is available at https://github.com/torognes/vsearch under either the BSD 2-clause license or the GNU General Public License version 3.0.

          Discussion

          VSEARCH has been shown to be a fast, accurate and full-fledged alternative to USEARCH. A free and open-source versatile tool for sequence analysis is now available to the metagenomics community.

          Related collections

          Most cited references 36

          • Record: found
          • Abstract: not found
          • Article: not found

          Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

           S Altschul (1997)
          The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Fast and accurate short read alignment with Burrows–Wheeler transform

            Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ∼10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: rd@sanger.ac.uk
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              QIIME allows analysis of high-throughput community sequencing data.

                Bookmark

                Author and article information

                Contributors
                Journal
                PeerJ
                PeerJ
                peerj
                peerj
                PeerJ
                PeerJ Inc. (San Francisco, USA )
                2167-8359
                18 October 2016
                2016
                : 4
                Affiliations
                [1 ]Department of Informatics, University of Oslo , Oslo, Norway
                [2 ]Department of Microbiology, Oslo University Hospital , Oslo, Norway
                [3 ]Heidelberg Institute for Theoretical Studies , Heidelberg, Germany
                [4 ]Institute for Theoretical Informatics, Karlsruhe Institute of Technology , Karlsruhe, Germany
                [5 ]School of Engineering, University of Glasgow , Glasgow, United Kingdom
                [6 ]Warwick Medical School, University of Warwick , Coventry, United Kingdom
                [7 ]Department of Ecology, University of Kaiserslautern , Kaiserslautern, Germany
                [8 ]UMR LSTM, CIRAD , Montpellier, France
                Article
                2584
                10.7717/peerj.2584
                5075697
                ©2016 Rognes et al.

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

                Funding
                Funded by: UNINETT Sigma2
                Award ID: NN9383K
                Funded by: Unilever
                Funded by: MRC Cloud Infrastructure for Microbial Bioinformatics (CLIMB)
                Award ID: MR/L015080/1
                Award ID: MR/M50161X/1
                Funded by: Deutsche Forschungsgemeinschaft
                Award ID: #DU1319/1-1
                This research was supported in part with computational resources at the University of Oslo provided by UNINETT Sigma2 project NN9383K and funded by the Research Council of Norway. BN was funded by BBSRC CASE studentship supported by Unilever. CQ was funded through the MRC Cloud Infrastructure for Microbial Bioinformatics (CLIMB) project (MR/L015­080/1) through fellowship (MR/M50161X/1). FM was supported by the Deutsche Forschungsgemeinschaft (grant #DU1319/1-1). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Biodiversity
                Bioinformatics
                Computational Biology
                Genomics
                Microbiology

                Comments

                Comment on this article