• Record: found
  • Abstract: found
  • Article: found
Is Open Access

Identification of Klebsiella capsule synthesis loci from whole genome data

Read this article at

      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


      Klebsiella pneumoniae is a growing cause of healthcare-associated infections for which multi-drug resistance is a concern. Its polysaccharide capsule is a major virulence determinant and epidemiological marker. However, little is known about capsule epidemiology since serological typing is not widely accessible and many isolates are serologically non-typeable. Molecular typing techniques provide useful insights, but existing methods fail to take full advantage of the information in whole genome sequences. We investigated the diversity of the capsule synthesis loci (K-loci) among 2503 K. pneumoniae genomes. We incorporated analyses of full-length K-locus nucleotide sequences and also clustered protein-encoding sequences to identify, annotate and compare K-locus structures. We propose a standardized nomenclature for K-loci and present a curated reference database. A total of 134 distinct K-loci were identified, including 31 novel types. Comparative analyses indicated 508 unique protein-encoding gene clusters that appear to reassort via homologous recombination. Extensive intra- and inter-locus nucleotide diversity was detected among the wzi and wzc genes, indicating that current molecular typing schemes based on these genes are inadequate. As a solution, we introduce Kaptive, a novel software tool that automates the process of identifying K-loci based on full locus information extracted from whole genome sequences ( This work highlights the extensive diversity of Klebsiella K-loci and the proteins that they encode. The nomenclature, reference database and novel typing method presented here will become essential resources for genomic surveillance and epidemiological investigations of this pathogen.

      Related collections

      Most cited references 75

      • Record: found
      • Abstract: found
      • Article: not found

      MUSCLE: multiple sequence alignment with high accuracy and high throughput.

       Robert Edgar (2004)
      We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
        • Record: found
        • Abstract: found
        • Article: not found

        RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models.

        RAxML-VI-HPC (randomized axelerated maximum likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies with maximum likelihood (ML). Low-level technical optimizations, a modification of the search algorithm, and the use of the GTR+CAT approximation as replacement for GTR+Gamma yield a program that is between 2.7 and 52 times faster than the previous version of RAxML. A large-scale performance comparison with GARLI, PHYML, IQPNNI and MrBayes on real data containing 1000 up to 6722 taxa shows that RAxML requires at least 5.6 times less main memory and yields better trees in similar times than the best competing program (GARLI) on datasets up to 2500 taxa. On datasets > or =4000 taxa it also runs 2-3 times faster than GARLI. RAxML has been parallelized with MPI to conduct parallel multiple bootstraps and inferences on distinct starting trees. The program has been used to compute ML trees on two of the largest alignments to date containing 25,057 (1463 bp) and 2182 (51,089 bp) taxa, respectively.
          • Record: found
          • Abstract: found
          • Article: not found

          SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.

          The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( ). It is distributed as open source software.

            Author and article information

            [ 1]Centre for Systems Genomics, University of Melbourne , Parkville, Australia
            [ 2]Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, University of Melbourne , Parkville, Australia
            [ 3]Infectious Diseases and Microbiology Unit, The Alfred Hospital , Melbourne, Australia
            [ 4]LimmaTech Biologics AG , Schlieren, Switzerland
            [ 5]The Wellcome Trust Sanger Institute , Hinxton, Cambridge, UK
            [ 6]London School of Hygiene and Tropical Medicine , Keppel Street, London, UK
            Author notes
            Correspondence Kelly L. Wyres ( kwyres@ )
            Kathryn E. Holt ( kholt@ )

            All supporting data, code and protocols have been provided within the article or through supplementary data files.

            Microb Genom
            Microbial Genomics
            Microbiology Society
            December 2016
            12 December 2016
            : 2
            : 12
            © 2016 The Authors

            This is an open access article under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.

            Funded by: National Health and Medical Research Council
            Award ID: Fellowship #1061409
            Funded by: National Health and Medical Research Council
            Award ID: Project #1043822
            Funded by: Wellcome Trust (GB)
            Award ID: 098051
            Research Paper
            Microbial evolution and epidemiology: Population Genomics
            Custom metadata


            Comment on this article

            Similar articles 1,953

            Cited by 31

            Most referenced authors 1,067