6
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Comprehensive and accurate genetic variant identification from contaminated and low-coverage Mycobacterium tuberculosis whole genome sequencing data

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Improved understanding of the genomic variants that allow Mycobacterium tuberculosis ( Mtb) to acquire drug resistance, or tolerance, and increase its virulence are important factors in controlling the current tuberculosis epidemic. Current approaches to Mtb sequencing, however, cannot reveal Mtb’s full genomic diversity due to the strict requirements of low contamination levels, high Mtb sequence coverage and elimination of complex regions. We have developed the XBS (compleX Bacterial Samples) bioinformatics pipeline, which implements joint calling and machine-learning-based variant filtering tools to specifically improve variant detection in the important Mtb samples that do not meet these criteria, such as those from unbiased sputum samples. Using novel simulated datasets, which permit exact accuracy verification, XBS was compared to the UVP and MTBseq pipelines. Accuracy statistics showed that all three pipelines performed equally well for sequence data that resemble those obtained from culture isolates of high depth of coverage and low-level contamination. In the complex genomic regions, however, XBS accurately identified 9.0 % more SNPs and 8.1 % more single nucleotide insertions and deletions than the WHO-endorsed unified analysis variant pipeline. XBS also had superior accuracy for sequence data that resemble those obtained directly from sputum samples, where depth of coverage is typically very low and contamination levels are high. XBS was the only pipeline not affected by low depth of coverage (5–10×), type of contamination and excessive contamination levels (>50 %). Simulation results were confirmed using whole genome sequencing (WGS) data from clinical samples, confirming the superior performance of XBS with a higher sensitivity (98.8%) when analysing culture isolates and identification of 13.9 % more variable sites in WGS data from sputum samples as compared to MTBseq, without evidence for false positive variants when rRNA regions were excluded. The XBS pipeline facilitates sequencing of less-than-perfect Mtb samples. These advances will benefit future clinical applications of Mtb sequencing, especially WGS directly from clinical specimens, thereby avoiding in vitro biases and making many more samples available for drug resistance and other genomic analyses. The additional genetic resolution and increased sample success rate will improve genome-wide association studies and sequence-based transmission studies.

          Related collections

          Most cited references29

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era

          Abstract IQ-TREE (http://www.iqtree.org, last accessed February 6, 2020) is a user-friendly and widely used software package for phylogenetic inference using maximum likelihood. Since the release of version 1 in 2014, we have continuously expanded IQ-TREE to integrate a plethora of new models of sequence evolution and efficient computational approaches of phylogenetic inference to deal with genomic data. Here, we describe notable features of IQ-TREE version 2 and highlight the key advantages over other software.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A framework for variation discovery and genotyping using next-generation DNA sequencing data

            Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              ART: a next-generation sequencing read simulator.

              ART is a set of simulation tools that generate synthetic next-generation sequencing reads. This functionality is essential for testing and benchmarking tools for next-generation sequencing data analysis including read alignment, de novo assembly and genetic variation discovery. ART generates simulated sequencing reads by emulating the sequencing process with built-in, technology-specific read error models and base quality value profiles parameterized empirically in large sequencing datasets. We currently support all three major commercial next-generation sequencing platforms: Roche's 454, Illumina's Solexa and Applied Biosystems' SOLiD. ART also allows the flexibility to use customized read error model parameters and quality profiles. Both source and binary software packages are available at http://www.niehs.nih.gov/research/resources/software/art.
                Bookmark

                Author and article information

                Journal
                Microb Genom
                Microb Genom
                mgen
                mgen
                Microbial Genomics
                Microbiology Society
                2057-5858
                2021
                18 November 2021
                18 November 2021
                : 7
                : 11
                : 000689
                Affiliations
                [ 1] departmentFamily Medicine and Population Health (FAMPOP) , Faculty of Medicine and Health Sciences, University of Antwerp , Antwerp, Belgium
                [ 2] departmentSouth African Medical Research Council Centre for Tuberculosis Research and DST/NRF Centre of Excellence for Biomedical Tuberculosis Research , Division of Molecular Biology and Human Genetics, Stellenbosch University , Stellenbosch, South Africa
                Author notes
                *Correspondence: Tim H. Heupink, tim.heupink@ 123456uantwerpen.be

                Simulated sequencing data have been deposited in SRA BioProject PRJNA706121.

                Author information
                https://orcid.org/0000-0001-6237-3898
                https://orcid.org/0000-0002-5647-5852
                https://orcid.org/0000-0001-5741-7358
                https://orcid.org/0000-0001-7666-3263
                Article
                000689
                10.1099/mgen.0.000689
                8743552
                34793294
                55f842d0-1224-410c-afe0-2c5c0fa49a1d
                © 2021 The Authors

                This is an open-access article distributed under the terms of the Creative Commons Attribution NonCommercial License.

                History
                : 09 March 2021
                : 09 September 2021
                Funding
                Funded by: fonds wetenschappelijk onderzoek
                Award ID: G0F8316N
                Award Recipient : AnneliesVan Rie
                Categories
                Research Articles
                Pathogens and Epidemiology
                Custom metadata
                0

                bacteria,contamination,low coverage,mycobacterium tuberculosis,sputum,whole-genome sequencing

                Comments

                Comment on this article