209
views
0
recommends
+1 Recommend
1 collections
    1
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Bracken: estimating species abundance in metagenomics data

      research-article

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Metagenomic experiments attempt to characterize microbial communities using high-throughput DNA sequencing. Identification of the microorganisms in a sample provides information about the genetic profile, population structure, and role of microorganisms within an environment. Until recently, most metagenomics studies focused on high-level characterization at the level of phyla, or alternatively sequenced the 16S ribosomal RNA gene that is present in bacterial species. As the cost of sequencing has fallen, though, metagenomics experiments have increasingly used unbiased shotgun sequencing to capture all the organisms in a sample. This approach requires a method for estimating abundance directly from the raw read data. Here we describe a fast, accurate new method that computes the abundance at the species level using the reads collected in a metagenomics experiment. Bracken (Bayesian Reestimation of Abundance after Classification with KrakEN) uses the taxonomic assignments made by Kraken, a very fast read-level classifier, along with information about the genomes themselves to estimate abundance at the species level, the genus level, or above. We demonstrate that Bracken can produce accurate species- and genus-level abundance estimates even when a sample contains multiple near-identical species.

          Most cited references13

          • Record: found
          • Abstract: found
          • Article: not found

          Metagenomics: genomic analysis of microbial communities.

          Uncultured microorganisms comprise the majority of the planet's biological diversity. Microorganisms represent two of the three domains of life and contain vast diversity that is the product of an estimated 3.8 billion years of evolution. In many environments, as many as 99% of the microorganisms cannot be cultured by standard techniques, and the uncultured fraction includes diverse organisms that are only distantly related to the cultured ones. Therefore, culture-independent methods are essential to understand the genetic diversity, population structure, and ecological roles of the majority of microorganisms. Metagenomics, or the culture-independent genomic analysis of an assemblage of microorganisms, has potential to answer fundamental questions in microbial ecology. This review describes progress toward understanding the biology of uncultured Bacteria, Archaea, and viruses through metagenomic analyses.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Bacillus anthracis, Bacillus cereus, and Bacillus thuringiensis--one species on the basis of genetic evidence.

            Bacillus anthracis, Bacillus cereus, and Bacillus thuringiensis are members of the Bacillus cereus group of bacteria, demonstrating widely different phenotypes and pathological effects. B. anthracis causes the acute fatal disease anthrax and is a potential biological weapon due to its high toxicity. B. thuringiensis produces intracellular protein crystals toxic to a wide number of insect larvae and is the most commonly used biological pesticide worldwide. B. cereus is a probably ubiquitous soil bacterium and an opportunistic pathogen that is a common cause of food poisoning. In contrast to the differences in phenotypes, we show by multilocus enzyme electrophoresis and by sequence analysis of nine chromosomal genes that B. anthracis should be considered a lineage of B. cereus. This determination is not only a formal matter of taxonomy but may also have consequences with respect to virulence and the potential of horizontal gene transfer within the B. cereus group.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data

              Due to the complexity of the protocols and a limited knowledge of the nature of microbial communities, simulating metagenomic sequences plays an important role in testing the performance of existing tools and data analysis methods with metagenomic data. We developed metagenomic read simulators with platform-specific (Sanger, pyrosequencing, Illumina) base-error models, and simulated metagenomes of differing community complexities. We first evaluated the effect of rigorous quality control on Illumina data. Although quality filtering removed a large proportion of the data, it greatly improved the accuracy and contig lengths of resulting assemblies. We then compared the quality-trimmed Illumina assemblies to those from Sanger and pyrosequencing. For the simple community (10 genomes) all sequencing technologies assembled a similar amount and accurately represented the expected functional composition. For the more complex community (100 genomes) Illumina produced the best assemblies and more correctly resembled the expected functional composition. For the most complex community (400 genomes) there was very little assembly of reads from any sequencing technology. However, due to the longer read length the Sanger reads still represented the overall functional composition reasonably well. We further examined the effect of scaffolding of contigs using paired-end Illumina reads. It dramatically increased contig lengths of the simple community and yielded minor improvements to the more complex communities. Although the increase in contig length was accompanied by increased chimericity, it resulted in more complete genes and a better characterization of the functional repertoire. The metagenomic simulators developed for this research are freely available.
                Bookmark

                Author and article information

                Contributors
                Journal
                peerj-cs
                peerj-cs
                PeerJ Comput. Sci.
                PeerJ Computer Science
                PeerJ Comput. Sci.
                PeerJ Inc. (San Francisco, USA )
                2376-5992
                2 January 2017
                : 3
                : e104
                Affiliations
                [1 ]Department of Biomedical Engineering, Johns Hopkins University , Baltimore, MD, United States
                [2 ]Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine , Baltimore, MD, United States
                [3 ]Applied Physics Laboratory, Johns Hopkins University , Laurel, MD, United States
                [4 ]Departments of Computer Science and Biostatistics, Johns Hopkins University , Baltimore, MD, United States
                Article
                cs-104
                10.7717/peerj-cs.104
                c5a399c2-3699-41e6-9244-4ab66c3a0aa6
                ©2017 Lu et al.

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

                History
                : 7 July 2016
                : 28 November 2016
                Funding
                Funded by: US National Institutes of Health
                Award ID: R01-HG006677
                Award ID: R01-GM083873
                Funded by: US Army Research Office
                Award ID: W911NF-1410490
                This work was supported in part by the US National Institutes of Health R01-HG006677 and R01-GM083873 and by the US Army Research Office W911NF-1410490. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Bioinformatics
                Computational Biology

                Computer science
                Metagenomics,Species abundance,Microbiome,Bayesian estimation
                Computer science
                Metagenomics, Species abundance, Microbiome, Bayesian estimation

                Comments

                Comment on this article