59
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Short read sequence typing (SRST): multi-locus sequence types from short reads

      product-review

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Multi-locus sequence typing (MLST) has become the gold standard for population analyses of bacterial pathogens. This method focuses on the sequences of a small number of loci (usually seven) to divide the population and is simple, robust and facilitates comparison of results between laboratories and over time. Over the last decade, researchers and population health specialists have invested substantial effort in building up public MLST databases for nearly 100 different bacterial species, and these databases contain a wealth of important information linked to MLST sequence types such as time and place of isolation, host or niche, serotype and even clinical or drug resistance profiles. Recent advances in sequencing technology mean it is increasingly feasible to perform bacterial population analysis at the whole genome level. This offers massive gains in resolving power and genetic profiling compared to MLST, and will eventually replace MLST for bacterial typing and population analysis. However given the wealth of data currently available in MLST databases, it is crucial to maintain backwards compatibility with MLST schemes so that new genome analyses can be understood in their proper historical context.

          Results

          We present a software tool, SRST, for quick and accurate retrieval of sequence types from short read sets, using inputs easily downloaded from public databases. SRST uses read mapping and an allele assignment score incorporating sequence coverage and variability, to determine the most likely allele at each MLST locus. Analysis of over 3,500 loci in more than 500 publicly accessible Illumina read sets showed SRST to be highly accurate at allele assignment. SRST output is compatible with common analysis tools such as eBURST, Clonal Frame or PhyloViz, allowing easy comparison between novel genome data and MLST data. Alignment, fastq and pileup files can also be generated for novel alleles.

          Conclusions

          SRST is a novel software tool for accurate assignment of sequence types using short read data. Several uses for the tool are demonstrated, including quality control for high-throughput sequencing projects, plasmid MLST and analysis of genomic data during outbreak investigation. SRST is open-source, requires Python, BWA and SamTools, and is available from http://srst.sourceforge.net.

          Related collections

          Most cited references13

          • Record: found
          • Abstract: found
          • Article: not found

          eBURST: inferring patterns of evolutionary descent among clusters of related bacterial genotypes from multilocus sequence typing data.

          The introduction of multilocus sequence typing (MLST) for the precise characterization of isolates of bacterial pathogens has had a marked impact on both routine epidemiological surveillance and microbial population biology. In both fields, a key prerequisite for exploiting this resource is the ability to discern the relatedness and patterns of evolutionary descent among isolates with similar genotypes. Traditional clustering techniques, such as dendrograms, provide a very poor representation of recent evolutionary events, as they attempt to reconstruct relationships in the absence of a realistic model of the way in which bacterial clones emerge and diversify to form clonal complexes. An increasingly popular approach, called BURST, has been used as an alternative, but present implementations are unable to cope with very large data sets and offer crude graphical outputs. Here we present a new implementation of this algorithm, eBURST, which divides an MLST data set of any size into groups of related isolates and clonal complexes, predicts the founding (ancestral) genotype of each clonal complex, and computes the bootstrap support for the assignment. The most parsimonious patterns of descent of all isolates in each clonal complex from the predicted founder(s) are then displayed. The advantages of eBURST for exploring patterns of evolutionary descent are demonstrated with a number of examples, including the simple Spain(23F)-1 clonal complex of Streptococcus pneumoniae, "population snapshots" of the entire S. pneumoniae and Staphylococcus aureus MLST databases, and the more complicated clonal complexes observed for Campylobacter jejuni and Neisseria meningitidis.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Multilocus sequence typing of bacteria.

            Multilocus sequence typing (MLST) was proposed in 1998 as a portable, universal, and definitive method for characterizing bacteria, using the human pathogen Neisseria meningitidis as an example. In addition to providing a standardized approach to data collection, by examining the nucleotide sequences of multiple loci encoding housekeeping genes, or fragments of them, MLST data are made freely available over the Internet to ensure that a uniform nomenclature is readily available to all those interested in categorizing bacteria. At the time of writing, over thirty MLST schemes have been published and made available on the Internet, mostly for pathogenic bacteria, although there are schemes for pathogenic fungi and some nonpathogenic bacteria. MLST data have been employed in epidemiological investigations of various scales and in studies of the population biology, pathogenicity, and evolution of bacteria. The increasing speed and reduced cost of nucleotide sequence determination, together with improved web-based databases and analysis tools, present the prospect of increasingly wide application of MLST.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Ribosomal multilocus sequence typing: universal characterization of bacteria from domain to strain

              No single genealogical reconstruction or typing method currently encompasses all levels of bacterial diversity, from domain to strain. We propose ribosomal multilocus sequence typing (rMLST), an approach which indexes variation of the 53 genes encoding the bacterial ribosome protein subunits (rps genes), as a means of integrating microbial genealogy and typing. As with multilocus sequence typing (MLST), rMLST employs curated reference sequences to identify gene variants efficiently and rapidly. The rps loci are ideal targets for a universal characterization scheme as they are: (i) present in all bacteria; (ii) distributed around the chromosome; and (iii) encode proteins which are under stabilizing selection for functional conservation. Collectively, the rps loci exhibit variation that resolves bacteria into groups at all taxonomic and most typing levels, providing significantly more resolution than 16S small subunit rRNA gene phylogenies. A web-accessible expandable database, comprising whole-genome data from more than 1900 bacterial isolates, including 28 draft genomes assembled de novo from the European Bioinformatics Institute (EBI) sequence read archive, has been assembled. The rps gene variation catalogued in this database permits rapid and computationally non-intensive identification of the phylogenetic position of any bacterial sequence at the domain, phylum, class, order, family, genus, species and strain levels. The groupings generated with rMLST data are consistent with current nomenclature schemes and independent of the clustering algorithm used. This approach is applicable to the other domains of life, potentially providing a rational and universal approach to the classification of life that is based on one of its fundamental features, the translation mechanism.
                Bookmark

                Author and article information

                Journal
                BMC Genomics
                BMC Genomics
                BMC Genomics
                BioMed Central
                1471-2164
                2012
                24 July 2012
                : 13
                : 338
                Affiliations
                [1 ]Department of Microbiology and Immunology, The University of Melbourne, Melbourne, Australia
                [2 ]Department of Pathology, The University of Melbourne, Melbourne, Australia
                [3 ]NICTA Victoria Research Laboratory, Department of Computing & Information Systems, The University of Melbourne, Melbourne, Australia
                Article
                1471-2164-13-338
                10.1186/1471-2164-13-338
                3460743
                22827703
                c65fd4bc-738e-4565-abe2-1e47c00d1799
                Copyright ©2012 Inouye et al.; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 3 April 2012
                : 24 July 2012
                Categories
                Software

                Genetics
                short read,population analysis,illumina,microbiology,bacteria,chromosome,plasmid,mlst,sequence analysis,outbreak

                Comments

                Comment on this article