108
views
0
recommends
+1 Recommend
2 collections
    2
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      PlasmidTron: assembling the cause of phenotypes and genotypes from NGS data

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Increasingly rich metadata are now being linked to samples that have been whole-genome sequenced. However, much of this information is ignored. This is because linking this metadata to genes, or regions of the genome, usually relies on knowing the gene sequence(s) responsible for the particular trait being measured and looking for its presence or absence in that genome. Examples of this would be the spread of antimicrobial resistance genes carried on mobile genetic elements (MGEs). However, although it is possible to routinely identify the resistance gene, identifying the unknown MGE upon which it is carried can be much more difficult if the starting point is short-read whole-genome sequence data. The reason for this is that MGEs are often full of repeats and so assemble poorly, leading to fragmented consensus sequences. Since mobile DNA, which can carry many clinically and ecologically important genes, has a different evolutionary history from the host, its distribution across the host population will, by definition, be independent of the host phylogeny. It is possible to use this phenomenon in a genome-wide association study to identify both the genes associated with the specific trait and also the DNA linked to that gene, for example the flanking sequence of the plasmid vector on which it is encoded, which follows the same patterns of distribution as the marker gene/sequence itself. We present PlasmidTron, which utilizes the phenotypic data normally available in bacterial population studies, such as antibiograms, virulence factors, or geographical information, to identify traits that are likely to be present on DNA that can randomly reassort across defined bacterial populations. It is also possible to use this methodology to associate unknown genes/sequences (e.g. plasmid backbones) with a specific molecular signature or marker (e.g. resistance gene presence or absence) using PlasmidTron. PlasmidTron uses a k-mer-based approach to identify reads associated with a phylogenetically unlinked phenotype. These reads are then assembled de novo to produce contigs in a fast and scalable-to-large manner. PlasmidTron is written in Python 3 and is available under the open source licence GNU GPL3 from https://github.com/sanger-pathogens/plasmidtron.

          Related collections

          Most cited references8

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM

          Heng Li (2013)
          Summary: BWA-MEM is a new alignment algorithm for aligning sequence reads or long query sequences against a large reference genome such as human. It automatically chooses between local and end-to-end alignments, supports paired-end reads and performs chimeric alignment. The algorithm is robust to sequencing errors and applicable to a wide range of sequence lengths from 70bp to a few megabases. For mapping 100bp sequences, BWA-MEM shows better performance than several state-of-art read aligners to date. Availability and implementation: BWA-MEM is implemented as a component of BWA, which is available at http://github.com/lh3/bwa. Contact: hengli@broadinstitute.org
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data

            To benchmark algorithms for automated plasmid sequence reconstruction from short-read sequencing data, we selected 42 publicly available complete bacterial genome sequences spanning 12 genera, containing 148 plasmids. We predicted plasmids from short-read data with four programs (PlasmidSPAdes, Recycler, cBar and PlasmidFinder) and compared the outcome to the reference sequences. PlasmidSPAdes reconstructs plasmids based on coverage differences in the assembly graph. It reconstructed most of the reference plasmids (recall=0.82), but approximately a quarter of the predicted plasmid contigs were false positives (precision=0.75). PlasmidSPAdes merged 84 % of the predictions from genomes with multiple plasmids into a single bin. Recycler searches the assembly graph for sub-graphs corresponding to circular sequences and correctly predicted small plasmids, but failed with long plasmids (recall=0.12, precision=0.30). cBar, which applies pentamer frequency analysis to detect plasmid-derived contigs, showed a recall and precision of 0.76 and 0.62, respectively. However, cBar categorizes contigs as plasmid-derived and does not bin the different plasmids. PlasmidFinder, which searches for replicons, had the highest precision (1.0), but was restricted by the contents of its database and the contig length obtained from de novo assembly (recall=0.36). PlasmidSPAdes and Recycler detected putative small plasmids ( 50 kbp) containing repeated sequences remains challenging and limits the high-throughput analysis of plasmids from short-read whole-genome sequencing data.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              An extended genotyping framework for Salmonella enterica serovar Typhi, the cause of human typhoid

              The population of Salmonella enterica serovar Typhi (S. Typhi), the causative agent of typhoid fever, exhibits limited DNA sequence variation, which complicates efforts to rationally discriminate individual isolates. Here we utilize data from whole-genome sequences (WGS) of nearly 2,000 isolates sourced from over 60 countries to generate a robust genotyping scheme that is phylogenetically informative and compatible with a range of assays. These data show that, with the exception of the rapidly disseminating H58 subclade (now designated genotype 4.3.1), the global S. Typhi population is highly structured and includes dozens of subclades that display geographical restriction. The genotyping approach presented here can be used to interrogate local S. Typhi populations and help identify recent introductions of S. Typhi into new or previously endemic locations, providing information on their likely geographical source. This approach can be used to classify clinical isolates and provides a universal framework for further experimental investigations.
                Bookmark

                Author and article information

                Journal
                Microb Genom
                Microb Genom
                MGen
                Microbial Genomics
                Microbiology Society
                2057-5858
                March 2018
                12 March 2018
                12 March 2018
                : 4
                : 3
                : e000164
                Affiliations
                [ 1]Infection Genomics, Wellcome Sanger Institute, Wellcome Genome Campus , Hinxton, Cambridge, UK
                [ 2]Department of Medicine, University of Cambridge , Cambridge, UK
                [ 3]London School of Hygiene and Tropical Medicine , London, UK
                [ ]Present address: Quadram Institute Bioscience, Norwich Research Park, Norwich, UK.
                Author notes
                *Correspondence: Andrew J. Page, andrew.page@ 123456quadram.ac.uk
                Article
                mgen000164
                10.1099/mgen.0.000164
                5885016
                0d4165ad-4c38-46fc-9160-3693fd08930c
                © 2018 The Authors

                This is an open access article under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.

                History
                : 19 October 2017
                : 26 February 2018
                Funding
                Funded by: Wellcome Trust
                Award ID: 098051
                Categories
                Methods Paper
                Genomic Methodologies: Genome-phenotype Association
                Custom metadata
                free

                plasmids,de novo assembly,genome-wide association study,mobile genetic elements,antimicrobial resistance

                Comments

                Comment on this article