31
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Mining, analyzing, and integrating viral signals from metagenomic data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Viruses are important components of microbial communities modulating community structure and function; however, only a couple of tools are currently available for phage identification and analysis from metagenomic sequencing data. Here we employed the random forest algorithm to develop VirMiner, a web-based phage contig prediction tool especially sensitive for high-abundances phage contigs, trained and validated by paired metagenomic and phagenomic sequencing data from the human gut flora.

          Results

          VirMiner achieved 41.06% ± 17.51% sensitivity and 81.91% ± 4.04% specificity in the prediction of phage contigs. In particular, for the high-abundance phage contigs, VirMiner outperformed other tools (VirFinder and VirSorter) with much higher sensitivity (65.23% ± 16.94%) than VirFinder (34.63% ± 17.96%) and VirSorter (18.75% ± 15.23%) at almost the same specificity. Moreover, VirMiner provides the most comprehensive phage analysis pipeline which is comprised of metagenomic raw reads processing, functional annotation, phage contig identification, and phage-host relationship prediction (CRISPR-spacer recognition) and supports two-group comparison when the input (metagenomic sequence data) includes different conditions (e.g., case and control). Application of VirMiner to an independent cohort of human gut metagenomes obtained from individuals treated with antibiotics revealed that 122 KEGG orthology and 118 Pfam groups had significantly differential abundance in the pre-treatment samples compared to samples at the end of antibiotic administration, including clustered regularly interspaced short palindromic repeats (CRISPR), multidrug resistance, and protein transport. The VirMiner webserver is available at http://sbb.hku.hk/VirMiner/.

          Conclusions

          We developed a comprehensive tool for phage prediction and analysis for metagenomic samples. Compared to VirSorter and VirFinder—the most widely used tools—VirMiner is able to capture more high-abundance phage contigs which could play key roles in infecting bacteria and modulating microbial community dynamics.

          Trial registration

          The European Union Clinical Trials Register, EudraCT Number: 2013-003378-28. Registered on 9 April 2014

          Electronic supplementary material

          The online version of this article (10.1186/s40168-019-0657-y) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references57

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Fast and accurate short read alignment with Burrows–Wheeler transform

          Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ∼10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: rd@sanger.ac.uk
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            BLAST+: architecture and applications

            Background Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. Results We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. Conclusion The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy.

              The Ribosomal Database Project (RDP) Classifier, a naïve Bayesian classifier, can rapidly and accurately classify bacterial 16S rRNA sequences into the new higher-order taxonomy proposed in Bergey's Taxonomic Outline of the Prokaryotes (2nd ed., release 5.0, Springer-Verlag, New York, NY, 2004). It provides taxonomic assignments from domain to genus, with confidence estimates for each assignment. The majority of classifications (98%) were of high estimated confidence (> or = 95%) and high accuracy (98%). In addition to being tested with the corpus of 5,014 type strain sequences from Bergey's outline, the RDP Classifier was tested with a corpus of 23,095 rRNA sequences as assigned by the NCBI into their alternative higher-order taxonomy. The results from leave-one-out testing on both corpora show that the overall accuracies at all levels of confidence for near-full-length and 400-base segments were 89% or above down to the genus level, and the majority of the classification errors appear to be due to anomalies in the current taxonomies. For shorter rRNA segments, such as those that might be generated by pyrosequencing, the error rate varied greatly over the length of the 16S rRNA gene, with segments around the V2 and V4 variable regions giving the lowest error rates. The RDP Classifier is suitable both for the analysis of single rRNA sequences and for the analysis of libraries of thousands of sequences. Another related tool, RDP Library Compare, was developed to facilitate microbial-community comparison based on 16S rRNA gene sequence libraries. It combines the RDP Classifier with a statistical test to flag taxa differentially represented between samples. The RDP Classifier and RDP Library Compare are available online at http://rdp.cme.msu.edu/.
                Bookmark

                Author and article information

                Contributors
                tingting.zheng@hku.hk
                jun.li@cityu.edu.hk
                yueqiong.ni@connect.hku.hk
                kkang@connect.hku.hk
                maanmi@biosustain.dtu.dk
                lejim@biosustain.dtu.dk
                bkcc@hku.hk
                aala@regionsjaelland.dk
                pmby@regionsjaelland.dk
                (45) 2151 8340 , msom@bio.dtu.dk
                (49) 3641 532-1759 , gipa@hku.hk , Gianni.Panagiotou@hki-jena.de
                Journal
                Microbiome
                Microbiome
                Microbiome
                BioMed Central (London )
                2049-2618
                19 March 2019
                19 March 2019
                2019
                : 7
                : 42
                Affiliations
                [1 ]ISNI 0000000121742757, GRID grid.194645.b, Systems Biology & Bioinformatics Group, School of Biological Sciences, Faculty of Sciences, , The University of Hong Kong, ; Hong Kong, Hong Kong, Special Administrative Region of China
                [2 ]ISNI 0000 0004 1792 6846, GRID grid.35030.35, Department of Infectious Diseases and Public Health, The Jockey Club College of Veterinary Medicine and Life Sciences, , City University of Hong Kong, ; Hong Kong, Hong Kong, Special Administrative Region of China
                [3 ]ISNI 0000 0004 1792 6846, GRID grid.35030.35, School of Data Science, , City University of Hong Kong, ; Hong Kong, Hong Kong, Special Administrative Region of China
                [4 ]ISNI 0000 0001 0143 807X, GRID grid.418398.f, Department of Systems Biology and Bioinformatics, , Leibniz Institute for Natural Product Research and Infection Biology, Hans Knöll Institute (HKI), ; Beutenbergstraße 11a, 07745 Jena, Germany
                [5 ]ISNI 0000 0001 2181 8870, GRID grid.5170.3, Bacterial Synthetic Biology Section, Novo Nordisk Foundation Center for Biosustainability, , Technical University of Denmark, ; Kemitorvet, 2800 Kongens Lyngby, Denmark
                [6 ]ISNI 0000000121742757, GRID grid.194645.b, School of Biological Sciences, Faculty of Science, , The University of Hong Kong, ; Hong Kong, Hong Kong, Special Administrative Region of China
                [7 ]GRID grid.476266.7, Department of Medicine, , Zealand University Hospital, ; Køge, Denmark
                [8 ]ISNI 0000 0001 0674 042X, GRID grid.5254.6, Department of Clinical Medicine, , University of Copenhagen, ; Copenhagen, Denmark
                [9 ]ISNI 0000000121742757, GRID grid.194645.b, Department of Microbiology, Li Ka Shing Faculty of Medicine, , The University of Hong Kong, ; Hong Kong, Hong Kong, Special Administrative Region of China
                Article
                657
                10.1186/s40168-019-0657-y
                6425642
                30890181
                0f50c6e5-5d62-48c3-93e2-f1d9814c9622
                © The Author(s). 2019

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 4 October 2018
                : 7 March 2019
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/501100001659, Deutsche Forschungsgemeinschaft;
                Award ID: H2020-MSCA-ITN-2018 813781
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100010663, H2020 European Research Council;
                Award ID: 638902
                Award Recipient :
                Funded by: the European Union PF7 (Health-2011-single-stage)
                Award ID: 282004
                Award Recipient :
                Funded by: The Lundbeck Foundation
                Award ID: R140-2013-13496
                Award Recipient :
                Funded by: the Novo Nordisk Foundation
                Award ID: NNF10CC1016517
                Award Recipient :
                Categories
                Methodology
                Custom metadata
                © The Author(s) 2019

                phage,metagenome,phage-host interaction,antibiotics
                phage, metagenome, phage-host interaction, antibiotics

                Comments

                Comment on this article