10
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification

      letter

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In order to determine the role of the database in taxonomic sequence classification, we examine the influence of the database over time on k-mer-based lowest common ancestor taxonomic classification. We present three major findings: the number of new species added to the NCBI RefSeq database greatly outpaces the number of new genera; as a result, more reads are classified with newer database versions, but fewer are classified at the species level; and Bayesian-based re-estimation mitigates this effect but struggles with novel genomes. These results suggest a need for new classification approaches specially adapted for large databases.

          Related collections

          Most cited references30

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Bracken: estimating species abundance in metagenomics data

          Metagenomic experiments attempt to characterize microbial communities using high-throughput DNA sequencing. Identification of the microorganisms in a sample provides information about the genetic profile, population structure, and role of microorganisms within an environment. Until recently, most metagenomics studies focused on high-level characterization at the level of phyla, or alternatively sequenced the 16S ribosomal RNA gene that is present in bacterial species. As the cost of sequencing has fallen, though, metagenomics experiments have increasingly used unbiased shotgun sequencing to capture all the organisms in a sample. This approach requires a method for estimating abundance directly from the raw read data. Here we describe a fast, accurate new method that computes the abundance at the species level using the reads collected in a metagenomics experiment. Bracken (Bayesian Reestimation of Abundance after Classification with KrakEN) uses the taxonomic assignments made by Kraken, a very fast read-level classifier, along with information about the genomes themselves to estimate abundance at the species level, the genus level, or above. We demonstrate that Bracken can produce accurate species- and genus-level abundance estimates even when a sample contains multiple near-identical species.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Fast Identification and Removal of Sequence Contamination from Genomic and Metagenomic Datasets

            High-throughput sequencing technologies have strongly impacted microbiology, providing a rapid and cost-effective way of generating draft genomes and exploring microbial diversity. However, sequences obtained from impure nucleic acid preparations may contain DNA from sources other than the sample. Those sequence contaminations are a serious concern to the quality of the data used for downstream analysis, causing misassembly of sequence contigs and erroneous conclusions. Therefore, the removal of sequence contaminants is a necessary and required step for all sequencing projects. We developed DeconSeq, a robust framework for the rapid, automated identification and removal of sequence contamination in longer-read datasets ( 150 bp mean read length). DeconSeq is publicly available as standalone and web-based versions. The results can be exported for subsequent analysis, and the databases used for the web-based version are automatically updated on a regular basis. DeconSeq categorizes possible contamination sequences, eliminates redundant hits with higher similarity to non-contaminant genomes, and provides graphical visualizations of the alignment results and classifications. Using DeconSeq, we conducted an analysis of possible human DNA contamination in 202 previously published microbial and viral metagenomes and found possible contamination in 145 (72%) metagenomes with as high as 64% contaminating sequences. This new framework allows scientists to automatically detect and efficiently remove unwanted sequence contamination from their datasets while eliminating critical limitations of current methods. DeconSeq's web interface is simple and user-friendly. The standalone version allows offline analysis and integration into existing data processing pipelines. DeconSeq's results reveal whether the sequencing experiment has succeeded, whether the correct sample was sequenced, and whether the sample contains any sequence contamination from DNA preparation or host. In addition, the analysis of 202 metagenomes demonstrated significant contamination of the non-human associated metagenomes, suggesting that this method is appropriate for screening all metagenomes. DeconSeq is available at http://deconseq.sourceforge.net/.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Bacillus anthracis, Bacillus cereus, and Bacillus thuringiensis--one species on the basis of genetic evidence.

              Bacillus anthracis, Bacillus cereus, and Bacillus thuringiensis are members of the Bacillus cereus group of bacteria, demonstrating widely different phenotypes and pathological effects. B. anthracis causes the acute fatal disease anthrax and is a potential biological weapon due to its high toxicity. B. thuringiensis produces intracellular protein crystals toxic to a wide number of insect larvae and is the most commonly used biological pesticide worldwide. B. cereus is a probably ubiquitous soil bacterium and an opportunistic pathogen that is a common cause of food poisoning. In contrast to the differences in phenotypes, we show by multilocus enzyme electrophoresis and by sequence analysis of nine chromosomal genes that B. anthracis should be considered a lineage of B. cereus. This determination is not only a formal matter of taxonomy but may also have consequences with respect to virulence and the potential of horizontal gene transfer within the B. cereus group.
                Bookmark

                Author and article information

                Contributors
                treangen@rice.edu
                Journal
                Genome Biol
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1474-7596
                1474-760X
                30 October 2018
                30 October 2018
                2018
                : 19
                : 165
                Affiliations
                [1 ]ISNI 0000 0001 0941 7177, GRID grid.164295.d, Center for Bioinformatics and Computational Biology, , University of Maryland, ; College Park, MD USA
                [2 ]ISNI 0000 0001 2233 9230, GRID grid.280128.1, Genome Informatics Section, Computational and Statistical Genomics Branch, , National Human Genome Research Institute, ; Bethesda, MD USA
                [3 ]ISNI 0000 0004 1936 8278, GRID grid.21940.3e, Department of Computer Science, , Rice University, ; Houston, TX USA
                Author information
                http://orcid.org/0000-0002-3760-564X
                Article
                1554
                10.1186/s13059-018-1554-6
                6206640
                30373669
                d3bf3094-6a31-4856-ab5e-9d0987e8747d
                © The Author(s). 2018

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 2 May 2018
                : 1 October 2018
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100000183, Army Research Office;
                Award ID: W911NF-17-2-0089
                Award Recipient :
                Categories
                Open Letter
                Custom metadata
                © The Author(s) 2018

                Genetics
                taxonomic classification,reference database,metagenomics,microbiome,comparative analysis,k-mer,lca

                Comments

                Comment on this article