54
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We previously reported on MetaBAT, an automated metagenome binning software tool to reconstruct single genomes from microbial communities for subsequent analyses of uncultivated microbial species. MetaBAT has become one of the most popular binning tools largely due to its computational efficiency and ease of use, especially in binning experiments with a large number of samples and a large assembly. MetaBAT requires users to choose parameters to fine-tune its sensitivity and specificity. If those parameters are not chosen properly, binning accuracy can suffer, especially on assemblies of poor quality. Here, we developed MetaBAT 2 to overcome this problem. MetaBAT 2 uses a new adaptive binning algorithm to eliminate manual parameter tuning. We also performed extensive software engineering optimization to increase both computational and memory efficiency. Comparing MetaBAT 2 to alternative software tools on over 100 real world metagenome assemblies shows superior accuracy and computing speed. Binning a typical metagenome assembly takes only a few minutes on a single commodity workstation. We therefore recommend the community adopts MetaBAT 2 for their metagenome binning experiments. MetaBAT 2 is open source software and available at https://bitbucket.org/berkeleylab/metabat.

          Related collections

          Most cited references24

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes

          Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. Although this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of “marker” genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate-, single-cell-, and metagenome-derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.

            We describe a program, tRNAscan-SE, which identifies 99-100% of transfer RNA genes in DNA sequence while giving less than one false positive per 15 gigabases. Two previously described tRNA detection programs are used as fast, first-pass prefilters to identify candidate tRNAs, which are then analyzed by a highly selective tRNA covariance model. This work represents a practical application of RNA covariance models, which are general, probabilistic secondary structure profiles based on stochastic context-free grammars. tRNAscan-SE searches at approximately 30 000 bp/s. Additional extensions to tRNAscan-SE detect unusual tRNA homologues such as selenocysteine tRNAs, tRNA-derived repetitive elements and tRNA pseudogenes.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              metaSPAdes: a new versatile metagenomic assembler

              While metagenomics has emerged as a technology of choice for analyzing bacterial populations, the assembly of metagenomic data remains challenging, thus stifling biological discoveries. Moreover, recent studies revealed that complex bacterial populations may be composed from dozens of related strains, thus further amplifying the challenge of metagenomic assembly. metaSPAdes addresses various challenges of metagenomic assembly by capitalizing on computational ideas that proved to be useful in assemblies of single cells and highly polymorphic diploid genomes. We benchmark metaSPAdes against other state-of-the-art metagenome assemblers and demonstrate that it results in high-quality assemblies across diverse data sets.
                Bookmark

                Author and article information

                Contributors
                Journal
                PeerJ
                PeerJ
                PeerJ
                PeerJ
                PeerJ
                PeerJ Inc. (San Diego, USA )
                2167-8359
                26 July 2019
                2019
                : 7
                : e7359
                Affiliations
                [1 ]Department of Energy, Joint Genome Institute , Walnut Creek, CA, USA
                [2 ]School of Computer Science and Technology, University of Science and Technology of China , Hefei, Anhui, China
                [3 ]Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory , Berkeley, CA, USA
                [4 ]School of Natural Sciences, University of California at Merced , Merced, CA, USA
                Author information
                http://orcid.org/0000-0003-1974-1768
                http://orcid.org/0000-0001-7422-5649
                http://orcid.org/0000-0001-7080-7801
                http://orcid.org/0000-0002-6307-0458
                Article
                7359
                10.7717/peerj.7359
                6662567
                31388474
                15d96f98-624b-44ab-abeb-aeda7c329c2f
                Copyright @ 2019

                This is an open access article, free of all copyright, made available under the Creative Commons Public Domain Dedication. This work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.

                History
                : 6 February 2019
                : 26 June 2019
                Funding
                Funded by: U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research
                Award ID: DE-AC02-05CH11231
                Funded by: China Scholarship Council (CSC)
                Dongwan Kang, Edward Kirton, Ashleigh Thomas, Rob Egan, and Zhong Wang’s work was supported by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research under Contract No. DE-AC02-05CH11231. Feng Li was supported by an exchange student fellowship from the China Scholarship Council (CSC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Bioinformatics
                Computational Biology
                Genomics
                Microbiology
                Statistics

                metagenomics,metagenome binning,clustering
                metagenomics, metagenome binning, clustering

                Comments

                Comment on this article