30
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Parallel Evolution of Streptococcus pneumoniae and Streptococcus mitis to Pathogenic and Mutualistic Lifestyles

      , , , ,
      mBio
      American Society for Microbiology

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          INTRODUCTION Streptococcus pneumoniae is a leading cause of pneumonia, meningitis, septicemia, and middle ear infections (1). According to data from the World Health Organization, S. pneumoniae is the fourth most frequent cause of fatal infections worldwide (2). Intriguingly, the species is not related to other overt streptococcal pathogens but clusters within the mitis group of streptococci, which otherwise are important members of the commensal microbiota of the oral cavity and pharynx (3, 4). The unique pathogenic potential of S. pneumoniae among the species of the mitis group streptococci is explained by an array of virulence factors that provide escape of host immunity, such as the polysaccharide capsule and the IgA1 protease, in addition to surface-exposed proteins that enable adhesion to and destruction of host tissues (5, 6). In spite of relative conservation of its genome, some pneumococcal virulence factors show extensive structural diversity that ensures survival of the species after immunity has developed in response to infection or vaccination (5). One example is the capsular polysaccharide, which occurs in more than 90 distinct structures, encoded by serotype-specific capsular biosynthesis operons (cps), which, combined, add up to the same size as the complete pneumococcal genome (~2.1 Mb) (7). The 13 capsular polysaccharides most frequently associated with disease form the basis of a childhood vaccine currently implemented in most industrialized countries (8). However, frequent switching of capsular serotype (9 – 11) and the potential emergence of novel structures present a significant challenge to the continued successful prevention of pneumococcal infections. Regulated natural competence for genetic transformation of pneumococci combined with induced lysis of noncompetent members of the same species enables frequent transfer of pathogenicity islands, exchange of complete virulence genes or fragments of them, and dissemination of antibiotic resistance within the species (12 – 17). In addition, recombination between S. pneumoniae, Streptococcus mitis, and Streptococcus oralis has been reported to be instrumental in the development and dissemination of resistance to beta-lactam antibiotics (18 – 20). We previously proposed an evolutionary model suggesting that the species S. pneumoniae, S. mitis, and the more recently described Streptococcus pseudopneumoniae arose from a pneumococcus-like organism pathogenic to the immediate ancestor of hominids (3). Being almost exclusively adapted to humans and other hominids, their success conceivably is closely associated with the population size of susceptible hosts. Here we present evidence supporting this evolutionary model and demonstrate the genetic basis of how a dichotomy of distinct but successful bacterial lifestyles evolved in parallel within their host. The pathogenic lifestyle of the pneumococcus, dependent on continued import of genes from neighboring species, results in antigenic diversity that will continue to challenge the prevention of pneumococcal infections. RESULTS Phylogenetic relationships based on core genome sequences. To shed light on the genetic processes that shaped the genomes of S. pneumoniae and its close commensal relatives, we explored new genomic information. Alignment of 35 genomes of S. pneumoniae, S. mitis, S. pseudopneumoniae, S. oralis, and Streptococcus infantis (see Table S1 in the supplemental material) identified a core of 822,537 nucleotides (nt). The number of polymorphic sites within this concatenated sequence was 292,227 (35.5%), of which 240,553 sites were parsimoniously informative (i.e., present in more than one strain). Phylogenetic reconstruction based on these core genome sequences confirmed our previous observation, based on selected housekeeping genes (3, 4), that S. pneumoniae is a single lineage in a cluster otherwise composed of S. mitis, that S. pseudopneumoniae takes up an intermediary position, and that all three species are well separated from S. oralis and S. infantis (Fig. 1). The average genetic distance of members of the S. mitis/S. pneumoniae/S. pseudopneumoniae cluster to the designated type strain of S. oralis, ATCC 35037, used as a common root, is slightly but significantly (P < 0.0001) greater for S. pneumoniae (0.001309 ± 0.0002) than for S. mitis (0.001278 ± 0.0008). This supports our hypothesis (3) that the S. pneumoniae lineage is the phylogenetically most ancient and only recently has been undergoing a population burst facilitated by the exponentially expanding human species, its primary host. Spreading vertically (21), success of the commensal species is not dependent on the host population size. FIG 1  Phylogenetic tree of Streptococcus strains included in the study. The tree, generated by the minimum-evolution algorithm in MEGA version 5.2, was based on 822,537-nt sequences shared by all 35 genomes listed in Table S1 in the supplemental material. It illustrates that S. pneumoniae is a single lineage in a cluster otherwise composed of S. mitis and that S. pseudopneumoniae occupies an intermediary position. The bar represents the genetic distance. Reductive evolution of the S. mitis genome. The previously demonstrated sporadic occurrence of recognized S. pneumoniae virulence factors in S. mitis strains (3, 22, 23) was confirmed by detailed comparison of the gene contents of the 35 genomes. Most strikingly, 12 out of 15 S. mitis strains had a complete cps locus in the same genomic region as in S. pneumoniae. Likewise, assumed virulence factors like IgA1 protease and zinc metalloprotease C, neuraminidases A and B, autolysin, pneumolysin, several choline-binding proteins, and PavA were present in some strains of S. mitis and absent in others (Fig. 2). FIG 2  Comparative analysis of the gene contents of 35 genomes of S. pneumoniae, S. mitis, S. pseudopneumoniae, and S. oralis. The genome of S. pneumoniae TIGR4 served as a reference. Green indicates the presence and red the absence of genes. Transposase genes are indicated by light-blue horizontal lines on the right. The figure illustrates the strain-specific reductive evolution of S. mitis genomes, resulting in gene loss to various extents, including genes encoding virulence properties in S. pneumoniae. To determine if such shared virulence genes represent pneumococcal genes transferred to S. mitis or genes ancestral to both, we generated phylogenetic trees of all predicted genes in S. pneumoniae TIGR4 and orthologs identified in all 35 genomes. In trees of virulence genes (one example is shown in Fig. S1A in the supplemental material), S. pneumoniae formed a tight cluster, whereas S. mitis strains formed more diverse lineages in patterns congruent with the core genome-based tree (Fig. 1). This proves that they are ancestral genes that have been diversifying in parallel with other parts of the genome and subsequently were lost by some S. mitis strains in a reductive evolutionary process. The loss is reflected in the S. mitis genomes being up to 15% smaller than those of S. pneumoniae (see Table S1). However, a surprising proportion (23.6%) of the 1,620 trees generated on the basis of nucleotide sequences of all genes (excluding transposases and genes unique to S. pneumoniae strains) showed clustering of S. pneumoniae genes among S. mitis genes. We interpret this as evidence of acquisition by S. pneumoniae strains of homologous gene sequences from strains of S. mitis. Although occasional trees identified the source of the gene sequence, the majority of transfers had as donors putative S. mitis clones not represented in our sample of the undoubtedly large global population of S. mitis (see Fig. S1B). The transfers from S. mitis to S. pneumoniae often affected several adjacent genes, amounting to sequences spanning from 116 bp to 10,600 bp, in full agreement with the sizes observed in an in vitro recombination experiment involving one strain each of S. mitis and S. pneumoniae (18). As shown in Table 1 and reflected in the phylogenetic tree in Fig. 1, Hungary19A was the strain of S. pneumoniae that acquired the largest proportion of genes (8.2% of genes, corresponding to 141 kb) from S. mitis. S. pseudopneumoniae showed extensive recombination between the S. pneumoniae lineage and S. mitis lineages, reflected in its intermediary position in the phylogenetic tree and its admixture of phenotypic traits of the two species (24). While 86% of the genes clustered with S. mitis 14% clustered with S. pneumoniae. No clear evidence of acquisition by S. mitis strains of gene sequences from S. pneumoniae was detected. However, as previously reported (18 – 20), genes encoding transpeptidases (“penicillin-binding proteins”), gyrase, and adjacent genes (e.g., orthologs of SP_0335, SP_0370, SP_0371, SP_1218, and SP_1662-1669) (25) revealed mosaic sequence structures (see Fig. S1C and D). This reflects multiple homologous recombination events between S. pneumoniae and S. mitis but often without clear traces of the direction of transfers. TABLE 1  Numbers of gene replacements in S. pneumoniae strains imported from S. mitis Strain Serotype No. (%) of genes imported from S. mitis a Hungary19A 19A 133 (8.2) Taiwan19F 19F 88 (5.4) CGSP14 14 85 (5.3) ATCC 700669 23F 72 (4.4) P1031 1 71 (4.4) TIGR4 4 61 (3.8) TCH8431 19A 60 (3.7) 670 6B 56 (3.5) JJA 14 56 (3.5) G54 19F 45 (2.8) 70585 5 38 (2.4) D39 2 28 (1.7) R6 Rough 2 27 (1.7) a Based on analysis of 1,620 annotated genes shared by S. pneumoniae TIGR4 and other isolates. Evolution of capsular polysaccharide diversity in S. pneumoniae. Next, we tested the hypothesis that import of genes explains the extreme structural diversity of capsular polysaccharides in S. pneumoniae (n = 95), which has remained an enigma. The pneumococcal cps operons consist of 12 to 22 genes directly involved in synthesis and transport of the polysaccharides (7). Among these, the glycosyl transferases, glycosyl phosphotransferases, dehydrogenases, mutases, and epimerases are often unique to one or more serotypes and determine the distinct polysaccharide structure (26). We aligned each protein (n = 1575) encoded by the cps locus of the S. pneumoniae serotypes (7) to the NCBI nonredundant protein database. This provided evidence of extensive import of cps operon genes from numerous Streptococcus species, including other members of the mitis group (S. mitis, “Streptococcus mitis biovar 2,” S. oralis, S. infantis, Streptococcus sanguinis, Streptococcus parasanguinis, and Streptococcus peroris) and members of the more distant anginosus and salivarius groups. The number of genes imported from a single or several different donor species ranged from one gene to the entire cps locus (see Table S2 in the supplemental material). Imported genes included genes that were part of a cps operon in the donor, as well as genes with other glycosylation functions outside the cps locus. The nucleotide identity between the putative donor and recipient cps genes ranged from 84 to 99%, presumably reflecting the time elapsed since the genetic transfer and/or the existence of donors not represented among the genome-sequenced streptococci. For instance, the membrane-associated flippase, responsible for transferring the oligosaccharide chains to the exterior of the pneumococcal membrane, is common to all pneumococcal cps operons except serotype 3 (7, 26). The genetic diversity among pneumococcal flippase genes is ~16 times larger than the overall diversity of the pneumococcal core genome (0.280% ± 0.056% versus 0.017% ± 0.003%), in support of a diverse origin of the gene among pneumococci (see Fig. S2 in the supplemental material). Alignments revealed >98% amino acid identities of flippases of several pneumococcal serotypes to those of strains of a range of Streptococcus taxa that are otherwise genetically more distant. The assumed direction of transfer was further supported by two gene-based observations. First, comparison of the genetic distance between genes from clonally independent strains of the respective serotypes of S. pneumoniae showed more conservation than among strains of donor species (Fig. 3). Second, several genes intact in the putative donor were pseudogenes in pneumococci. Comparison of serotypes belonging to the same serogroup (e.g., serogroups 7, 18, and 19) revealed that mutations resulting in pseudogenes, in some cases combined with import of additional genes from other donors or complete deletion of genes, have been driving the structural diversification within serogroups (see Fig. S3). As an example, an evolutionary model for the origin and diversification of the S. pneumoniae serogroup 19 is presented in Fig. 4. FIG 3  Example of comparisons and genetic distances of cps locus genes among S. pneumoniae and S. mitis strains. The nucleotide sequence identity (%) of orthologous genes in S. pneumoniae serotype 2 and “S. mitis biovar 2” strains are shown for each flanking pair. Clonally independent strains of the respective serotypes of S. pneumoniae showed more conservation than strains of donor species, supporting the proposed transfer from S. mitis to S. pneumoniae. FIG 4  Phylogenetic model for acquisition and diversification of the four serogroup 19 S. pneumoniae serotypes. Acquisition of the entire capsular biosynthesis operon from S. mitis (SK564) introduced the serotype 19c capsule in S. pneumoniae. Subsequent incorporation of transposase and RUP sequences in the operon facilitated transfer to other strains of S. pneumoniae, in which allelic replacement with selected genes acquired from “S. mitis biovar 2” (this taxon is erroneously classified as a biovar of S. mitis), loss of genes, and gene mutations resulted in the structurally distinct capsular polysaccharides of serotypes 19b, 19a, and 19f. A detailed comparison of the cps operons of the four serogroup 19 serotypes is shown in Fig. S4 in the supplemental material. Parallel evolution of genome plasticity and genome stability. These findings indicate that interspecies gene transfer between S. pneumoniae and neighboring species is unidirectional, i.e., from other species to S. pneumoniae. This is supported by further observations. Competence for genetic transformation in pneumococci depends on 22 genes (27). Screening of the 35 genomes identified all 22 genes in all genomes of S. pneumoniae and S. pseudopneumoniae, whereas only 10 out of 15 S. mitis strains and none of the S. oralis strains possessed all. Up to 3 of the 22 essential genes were missing or significantly truncated in some strains (see Table S3 in the supplemental material), suggesting reduced or lack of transformation competence. Several other genes facilitate the incorporation of foreign genetic elements in the pneumococcal genome. S. pneumoniae strains possess one of two complementary Dpn restriction-modification systems, DpnI or DpnII (28), that are part of the competence (com) regulon. It was recently demonstrated that induction of this system is necessary for optimal pathogenicity island transfer (29). While present in all S. pneumoniae strains, the majority of S. mitis strains lacked intact dpn loci (see Fig. S4 and Table S1 in the supplemental material). The genes were either missing in this location and in other parts of the genome or replaced by other genes, such as a transposase and an integrase in strain SK667. When present, alignments of the S. mitis Dpn locus genes with those of S. pneumoniae showed that they are ancestral genes diversified in parallel with other parts of the respective genomes. Interestingly, a third version of the locus (here termed DpnIII) was demonstrated in S. pneumoniae ATCC 700669, S. mitis SK578, and S. oralis ATCC 35037. In these strains, two genes on opposite strands and encoding a restriction enzyme resembling MutH of Escherichia coli and a DNA (cytosine-5-)-methyltransferase family protein constituted the locus. Other strains of S. oralis lacked Dpn locus genes. These observations suggest that the Dpn-associated function is under deterioration in S. mitis and S. oralis. Transposases are widely used in bacteria to facilitate intra- and interstrain mobility of genes or islands of genes (30). The 13 S. pneumoniae strains possessed from 19 to 111 (median, 77) such elements distributed over the entire genome (Fig. 2), although some are degenerate, in agreement with their constant adaptation to the transforming genome. Notably, transposases are associated with cps operons of all pneumococcal serotypes, in most cases flanking the entire operon (7). Although most S. mitis and all S. oralis strains examined had complete cps operons, none included transposases. In general, S. mitis and S. oralis genomes harbored significantly fewer transposase genes (median number, 8) (see Table S1 in the supplemental material). One exception was S. mitis strain B6 (31), which in several ways, including the genome size, is exceptional among S. mitis strains. Like transposases, repeat elements, including RUP (repeat units of pneumococcus) are assumed to facilitate genomic plasticity in addition to phase variation of genes (32, 33). In addition to facilitation of traditional homologous recombination, a recent report demonstrated that pneumococci can also generate diversity by transformation with fully homologous “self” DNA by generating a variety of merodiploids within a population facilitated by alternative pairing of repeat regions present in different parts of the genome (34). Analysis of the 35 genomes showed that pneumococcal genomes had 53 to 63 RUP elements, including one or two within the cps locus of all serotypes (except serotypes 5, 11a, and 23b), while S. mitis strains had either none or no more than three elements in the entire genome (see Table S1 in the supplemental material). Bacterial defense systems against attack by foreign DNA include the clustered, regularly interspaced short palindromic repeat (CRISPR) loci. In agreement with a recent report (29), none of the S. pneumoniae possessed CRISPR sequences. This corroborates the finding that CRISPR loci artificially inserted into a pneumococcal genome were spontaneously ejected when under environmental stress (35). Likewise, the S. pseudopneumoniae strains did not possess CRISPR/Cas systems. However, 5 of the 15 S. mitis strains and 4 of the 5 S. oralis strains possessed CRISPR sequences (see Table S1 in the supplemental material). A few of the spacers showed sequence similarity to bacteriophage/prophage sequences, most of which are Streptococcus specific and in some cases are integrated in S. pneumoniae and S. pseudopneumoniae genomes (not shown). DISCUSSION One factor in the coevolution of obligate symbionts of humans that has so far received little attention is the impact of the susceptible host population size. This factor is of particular importance in pathogenic (i.e., parasitic) species that induce immunity or sometimes death, leaving the host nonaccessible for repeated colonization. Thus, successful survival of the pathogen requires a sizeable host population of sufficient density to allow spread between susceptible hosts and/or a capacity of the pathogen for constant antigenic change. In contrast, commensals that achieve a mutualistic lifestyle induce a tolerogenic response in the host’s immune system, allowing continued colonization and intimate and potentially lifelong association (36). Many species of the genus Streptococcus are almost exclusively adapted to humans and other hominids. S. pneumoniae is one of the most important pathogens affecting humans (2). Although it is a widespread colonizer particularly of children in day care centers, both colonization and infection result in rapid elimination (median duration, 19 days) by antibodies directed to the capsular polysaccharide and presumably other surface-exposed antigens (37). In contrast, the closely related S. mitis is a lifelong companion of all humans in the upper respiratory tract and is often present as mixed populations of multiple clones (38, 39). We have previously demonstrated that the two species share an immediate ancestor and have argued that the ancestor was a pneumococcus-like species presumably pathogenic to the immediate ancestor of hominids (3). The genome-based data obtained in this study support this model. Our results, furthermore, illustrate how a significant selection pressure resulting from a shortage of potential hosts (40) was handled by the S. pneumococcus-S. mitis-S. pseudopneumoniae ancestor in two opposing ways occurring in parallel. S. pneumoniae maintained its pathogenic potential, which facilitates horizontal spread, and optimized its genome plasticity (17). In contrast, harmonious coexistence by the majority of lineages becoming S. mitis was achieved by elimination of properties that challenge the host combined with increased genome stability (i.e., partial loss of competence genes, transposases, repeat elements, and the Dpn restriction-modification system, combined with acquisition of CRISPR/Cas sequences). Interestingly, these S. mitis lineages are now highly diverse and, according to traditional taxonomic standards, would represent separate species (3). An important factor in this diversification process has been the ecological and genetic isolation of clones colonizing distinct lineages of human hosts combined with a vertical spreading pattern. Our demonstration of various levels of loss of virulence-associated factors and properties contributing to genome plasticity among the examined strains of S. mitis (Fig. 2; see also Table S1 in the supplemental material) indicates that this is an ongoing process brought to different degrees of completion by individual S. mitis lineages. Future studies may reveal if this is reflected in the occasional ability of S. mitis strains to cause bacteremia or endocarditis in groups of predisposed patients (41, 42). Another result of the need of S. pneumoniae to expand its ecological niche may be the adaptation of certain clones to an equine host, which also included loss of virulence-associated genes (43). Availability of a critical population of potential hosts (40) became an evolutionary bottleneck to the pathogen, reflected in the significant homogeneity of the core genome of today’s pneumococcus (Fig. 1). In addition to the expression of crucial virulence properties, life as a pathogen of the S. pneumoniae lineage required optimal genome plasticity, enabling antigenic diversity of surface structures. For example, the relative sequence diversification of the paralogous zinc metalloproteases IgA1 protease, ZmpB, and ZmpD is striking evidence of significantly enhanced selection for diversification of surface-exposed proteins in the pathogen S. pneumoniae compared to the closely related commensal streptococci (16). In addition to homologous recombination within the population of pneumococci, our results show that the need for diversification was remarkably solved by its continued exploitation of the gene pool of neighboring species. In some S. pneumoniae strains, up to 9% of the alleles of genes were imported from S. mitis (Table 1). This is an ongoing process facilitated by its colonization of an ecological niche, albeit briefly, where it frequently meets multiple members of related commensal species that serve as a genetic toolbox. Most remarkable is our finding that the previously enigmatic diversity of capsular polysaccharide structures expressed by S. pneumoniae is a direct result of gene import from several species of commensal streptococci, including S. mitis, the “S. mitis biovar 2” (mislabeled since it is more closely related to S. oralis [4]), S. oralis, S. infantis, S. sanguinis, S. parasanguinis, S. peroris, and members of the more distant anginosus and salivarius groups (see Table S3 in the supplemental material). In several serotypes, complete cps loci had been imported from a single donor, in some cases in several independent steps. In others, a mosaic of genes imported from distinct donors was evident. Contributing to the diversification that constitutes distinct serotypes belonging to the same serogroup (e.g., serogroups 7, 18, and 19) were mutations resulting in pseudogenes, import of additional genes from other donors, or complete deletion of genes (Fig. 3; see also Fig. S4). This process conceivably will continue to result in additional antigenic diversity that may challenge the currently successful prevention of pneumococcal infections by vaccination. This is the first demonstration of how selective pressures resulting from a shortage of potential hosts was solved by bacteria in two opposing ways occurring in parallel. Harmonious coexistence by lineages becoming S. mitis was achieved by elimination of properties that challenge the host combined with increased genome stability. Life as a pathogen of the S. pneumoniae lineage required optimal genome plasticity combined with antigenic diversity of surface structures, including capsular polysaccharides, a challenge remarkably solved by its continued exploitation of the gene pool of neighboring species. More recently, success of the S. pneumoniae lineage reflected in the lineage-specific boost of the pneumococcus population has been ensured by the dramatic expansion of the susceptible host population. MATERIALS AND METHODS Bacterial genomes. The 35 streptococcal genomes examined in the study are listed in Table S1 in the supplemental material together with NCBI accession numbers. A total of 11 genomes sequenced as a part of this study were generated using the 454 platform (GS20, FLX, and/or Titanium) and assembled with the Newbler assembler. Details on the libraries constructed, sequencing coverage, and assembler version used are available in the GenBank entries. Alignment of genomes. A multiple whole-genome nucleotide alignment of contigs or complete chromosomes from the 35 whole genomes was generated using the software program Mugsy (44), and clusters of syntenic orthologs across the genomes were obtained with Mugsy-Annotator (45). Phylogenetic analyses. A phylogenetic tree based on the concatenated core genome sequences from the Mugsy alignment was generated using the minimum-evolution algorithm according to the maximum composite likelihood model in the software program MEGA 5.2 (46) and validated by bootstrap analysis based on 500 replications. Recombination in selected genes was visualized using the program SplitsTree 4 (47). Bioinformatics tools and analyses. Annotated genome sequences from the 35 genomes (see Table S1 in the supplemental material) and Mugsy-Annotator clusters of syntenic orthologs were loaded into the Sybil comparative genomics software package (48) for comparative analyses. To determine the extent of recombination between S. pneumoniae and the related commensal species, we aligned nucleotide sequences within Mugsy-Annotator clusters and generated minimum-evolution phylogenetic trees in MEGA5.2. A total of 1,620 trees (excluding transposases and genes unique to S. pneumoniae strains) were generated and manually examined. The presence or absence of annotated genes based on Mugsy-Annotator clusters was detected in Sybil and confirmed by blastn analysis (49). Figure 2 was generated by loading profiles of gene presence and absence into the MeV interface (50). RUP (repeated unit of pneumococcus) elements were identified by searching TIGR4 RUP sequences (32) with blastn against the 35 genomes. Genetic distances, i.e., the number of base substitutions per site from averaging over all sequence pairs, were determined in MEGA5.2 using the maximum composite likelihood model (51) based on aligned single genes or concatamers of six multilocus sequence type (MLST) genes of S. pneumoniae (52). CRISPR regions were identified using the CRISPR finder tool (http://crispr.u-psud.fr). Nucleotide sequence accession numbers. The Whole Genome Shotgun projects have been deposited at DDBJ/EMBL/GenBank under the following accession numbers: Streptococcus mitis SK137, JPFS00000000; Streptococcus mitis SK271, JPGW00000000; Streptococcus mitis SK1126, JPFT00000000; Streptococcus mitis SK629, JPFU00000000; Streptococcus mitis SK667, JPFV00000000; Streptococcus mitis SK642, JPFW00000000; Streptococcus mitis SK637, JPFX00000000; Streptococcus mitis SK578, JPFY00000000; Streptococcus mitis SK608, JPFZ00000000; Streptococcus oralis SK141, JPGA00000000; Streptococcus oralis SK143, JPGB00000000. The versions described in this paper are versions XXXX01000000. SUPPLEMENTAL MATERIAL Figure S1 Phylogenetic trees based on aligned nucleotide sequences of selected genes in S. pneumoniae and related species. The tree in panel A was generated in MEGA 5.2 using the minimum-evolution algorithm, and the numbers on branches represent bootstrap values. Trees in panels B, C, and D were generated with the SplitsTree 4 software program. (A) The position of the pneumolysin gene (ply) from two strains of S. mitis and one strain of S. pseudopneumoniae distant from the cluster of S. pneumoniae gene sequences shows that the ply genes are ancestral genes that have been diversifying in parallel with other parts of the respective genomes. (B) Clustering of several S. pneumoniae gene sequences among S. mitis genes (indicated by arrows) is evidence of transfer from S. mitis to S. pneumoniae strains. The intermediary position of strain P1031 illustrates transfer of part of the gene. (C and D) Tree generated in SplitsTree, illustrating extensive intra- and interspecies recombination between genes encoding the penicillin-binding protein 1A (orthologs of SP_0369) (C) and the neighboring gene encoding recombination protein U (orthologs of SP_0370) (D). Download Figure S1, EPS file, 1.8 MB Figure S2 Phylogenetic tree based on aligned amino acid sequences of the flippase protein involved in capsular polysaccharide biosynthesis. The tree was generated in MEGA 5.2 using the minimum-evolution algorithm. The tree illustrates significant sequence diversity and clustering of S. pneumoniae sequences with sequences of flippases from distantly related Streptococcus species, indicated by arrows. The bar indicates the genetic distance. Download Figure S2, EPS file, 1.1 MB Figure S3 Comparison of cps operon structures in S. mitis SK564 and in S. pneumoniae operons encoding serotypes of serogroup 19. Gray connecting boxes indicate genes that were part of the same cluster of syntenic orthologs. The S. mitis SK564 and S. pneumoniae serotype 19c operons are identical apart from the transposon gene (tnp) and the mutated glf gene in the latter. Genes in serotype 19f and 19a operons presented in hatched color represent allelic replacements relative to SK564 and serotype 19c and 19b acquired from different donors, including “S. mitis biovar 2” (this taxon is erroneously classified as a biovar of S. mitis). Download Figure S3, EPS file, 1.6 MB Figure S4 Organization of the Dpn locus in representative S. pneumoniae, S. mitis, and S. pseudopneumoniae strains. S. pneumoniae strains possess one of two complementary Dpn restriction-modification systems, DpnI or DpnII. The DpnI organization consists of dpnI, encoding an atypical restriction enzyme cleaving methylated double-stranded DNA, and dpnD, of unknown function. The DpnI locus is shown for strain TIGR4 and is representative of the S. pneumoniae strains Taiwan19F, JJA, R6, D39, CGSP14, and TCH8431/19A. The DpnII organization, consisting of the methylase DpnIIA, the DpnIIB restrictase, and the DpmM double-stranded DNA methylase, shown for S. pneumoniae P1031, is also found in S. pneumoniae strains 670, 70585, and Hungary19A. Only two S. mitis strains (SK321 and SK629) possessed the DpnI locus. However, in both strains the dpnD gene was disrupted by a frameshift, and in addition, SK321 harbored an in-frame stop codon. Three S. mitis strains (B6, SK1126, and SK597) had a DpnII-like locus encoding DpmM and DpnIIA and a type II restriction enzyme (MjaIII) distinct from the DpnIIB restrictase of pneumococcal strains. In strains B6 and SK1126, a gene encoding a 270-amino-acid (aa) protein showing 89% amino acid identity to a hypothetical protein in Streptococcus sp. HPH0090 interspersed the gene encoding this restrictase and DpnA, while a 622-bp sequence with no open reading frame or homology to any sequence in the NCBI database took this place in SK597. In the majority of S. mitis strains, the dpn genes were either missing in this location and in other parts of the genome or replaced by other genes, such as a transposase and an integrase in strain SK667. S. mitis SK578 and S. pneumoniae ATCC 700669 had neither of the two described Dpn operons either in the usual region or in any other parts of the genome. Instead, both had two genes in opposite directions encoding a restriction enzyme resembling MutH of Escherichia coli and a DNA (cytosine-5-)-methyltransferase family protein presumably constituting a third version of the Dpn locus (DpnIII). The same locus structure was identified in S. oralis strain ATCC 35037, whereas other strains of S. oralis lacked these genes (not shown). In S. pseudopneumoniae strain IS7493, the locus included two transposase genes flanking a gene encoding a lipid A core-O-antigen-ligase-like enzyme. However, homologues of DpnIII genes were found elsewhere in the genome. The S. infantis genome showed no evidence of dpn genes. Alignments of the Dpn locus genes in S. pneumoniae and S. mitis strains showed that they are ancestral genes diversified in parallel with other parts of the respective genomes. Download Figure S4, EPS file, 3.6 MB Table S1 Streptococcus genomes examined in this study with selected characteristics Table S1, DOCX file, 0.1 MB. Table S2 S. pneumoniae capsular polysaccharide biosynthesis (cps) operon genes imported from other bacterial species and their respective donors; due to their widespread occurrence and conservation, the regulatory genes wzg, wzh, wzd, and wze and the four rhamnose pathway genes (rmlA to rmlD) are not included; the figures represent the sequence identity of the gene to that of the donor relative to the sequence identity of the core genomes of the recipient and S. pneumoniae TIGR4 Table S2, DOCX file, 0.1 MB. Table S3 Presence in S. mitis, S. oralis, and S. pseudopneumoniae of genes shown to be essential in S. pneumoniae for transformation competence Table S3, DOCX file, 0.1 MB.

          Related collections

          Most cited references44

          • Record: found
          • Abstract: found
          • Article: not found

          The global burden of group A streptococcal diseases.

          The global burden of disease caused by group A streptococcus (GAS) is not known. We review recent population-based data to estimate the burden of GAS diseases and highlight deficiencies in the available data. We estimate that there are at least 517,000 deaths each year due to severe GAS diseases (eg, acute rheumatic fever, rheumatic heart disease, post-streptococcal glomerulonephritis, and invasive infections). The prevalence of severe GAS disease is at least 18.1 million cases, with 1.78 million new cases each year. The greatest burden is due to rheumatic heart disease, with a prevalence of at least 15.6 million cases, with 282,000 new cases and 233,000 deaths each year. The burden of invasive GAS diseases is unexpectedly high, with at least 663,000 new cases and 163,000 deaths each year. In addition, there are more than 111 million prevalent cases of GAS pyoderma, and over 616 million incident cases per year of GAS pharyngitis. Epidemiological data from developing countries for most diseases is poor. On a global scale, GAS is an important cause of morbidity and mortality. These data emphasise the need to reinforce current control strategies, develop new primary prevention strategies, and collect better data from developing countries.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Genomic islands in pathogenic and environmental microorganisms.

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Genetic Analysis of the Capsular Biosynthetic Locus from All 90 Pneumococcal Serotypes

              Introduction Streptococcus pneumoniae (the pneumococcus) is a major cause of morbidity and mortality worldwide, causing diseases that range in severity from meningitis, septicaemia, and pneumonia to sinusitis and acute otitis media [1,2]. Factor (typing) sera are used to divide pneumococci into serotypes and serogroups, which include immunologically related serotypes. These sera have been developed by a process of multiple cross-absorptions, which render them specific for the immunochemical differences between the pneumococcal capsular polysaccharides (CPSs) [3]. At present, 90 individual serotypes are recognised by their patterns of reactivity with the factor sera [4], and serotypes vary in the extent to which they are carried in the nasopharynx and the degree to which they are recovered from different disease states [5,6]. Expression of a capsule is important for survival in the blood and is strongly associated with the ability of pneumococci to cause invasive disease. The capsule is surface exposed, and antibodies against CPS provide protection against pneumococcal disease. Consequently, polyvalent polysaccharide vaccines have been developed in which CPS from the serotypes most commonly associated with invasive disease in children are linked to a protein carrier, and a seven-valent conjugated polysaccharide vaccine has been introduced and shown to be highly effective [7,8]. A 23-valent polysaccharide vaccine is also available for use in adults [9]. With the exception of types 3 and 37, which are synthesised by the synthase pathway [10–14], pneumococcal CPSs are generally synthesised by the Wzx/Wzy-dependent pathway (Figure 1). The genes for the latter pathway are located at the same chromosomal locus (cps), between dexB and aliA [15–17]. CPSs are synthesised by transfer of an initial monosaccharide phosphate from a nucleotide diphosphate sugar to a membrane-associated lipid carrier, followed by the sequential transfer of further monosaccharides to produce the lipid-linked repeat unit. This is transferred to the outer face of the cytoplasmic membrane by the repeat-unit transporter or flippase, polymerised to form the mature CPS, and then attached to the peptidoglycan [18]. The cps locus therefore typically encodes the enzymes to build the repeat unit, including an initial glycosyl phosphate transferase, and additional transferases responsible for the formation of the linkages, and to allow for the addition of sugars (or other moieties), or to otherwise modify the repeat unit, as well as a repeat-unit flippase and polymerase [15]. Figure 1 Representation of the Wzx/Wzy-Dependent Pathway for Biosynthesis of CPS 9A Pictured is a hypothetical model for capsule biosynthesis in S. pneumoniae based on a mixture of experimental evidence and speculation. For a recent review, see Yother [15]. (1) Non-housekeeping nucleotide sugar biosynthesis. (2) The initial transferase (WchA in this case) links the initial sugar as a sugar phosphate (Glc-P) to a membrane-associated lipid carrier (widely assumed to be undecaprenyl phosphate). (3) Glycosyl transferases sequentially link further sugars to generate repeat unit. (4) Wzx flippase transports the repeat unit across the cytoplasmic membrane. (5) Wzy polymerase links individual repeat units to form lipid-linked CPS. (6) Wzd/Wze complex translocates mature CPS to the cell surface and may be responsible for the attachment to peptidoglycan. The complex of WchA, Wzy, Wzx, Wzd, and Wze shown in the membrane is based on that in Figure 2 of Whitfield and Paiment [47] for the related Escherichia coli Type 1 capsule. Figure 2 Capsule Biosynthesis Genes and Repeat-Unit Polysaccharide Structures Shown are the cps gene clusters for cases discussed in the text, together with the polysaccharide structure of the encoded repeat unit where known [31] (the full set is shown in Figure S1). Genes are represented on the forward and reverse strands by boxes coloured according to the gene key, with gene designations indicated above each box. Grey blocks indicate regions of sequence similarity between gene clusters. Repeat-unit structures are displayed with the linkage to undecaprenyl pyrophosphate at the right-hand side (not necessarily the case for the published structures [31]), so residue numbers are counted from right to left. Monosaccharides are represented as shapes coloured according to the structure key. Housekeeping sugars are coloured grey. Non-housekeeping sugar colours correspond to the associated sugar biosynthesis gene colours. Glycerol, choline, and acetate are indicated as text. Also shown are the nature of linkages with the associated gene, and the linkages between repeat units created by the Wzy polymerase. Gene designations are in parentheses where their substrate specificity is unclear. The substantial diversity of pneumococcal CPSs is believed to have arisen as a consequence of selection for antigenic diversity imposed by the human immune system [6]. The evolutionary timescales and the genetic events by which novel serogroups and serotypes arise are unclear. Comparisons of the available cps loci indicate a variety of genetic mechanisms and show that the central genes responsible for the synthesis and polymerisation of the repeat unit are highly variable and often non-homologous between serotypes. These genes have a low percentage G+C content, and new serotypes may frequently have been generated by the introduction of novel cps genes into pneumococci by lateral gene transfer from other species. A much better understanding of the complex mechanisms by which antigenic diversity arises could be obtained by using the sequences of the complete set of pneumococcal cps loci. We therefore obtained sequences of the cps locus for all 90 serotypes and used these data, together with the available polysaccharide structures and the patterns of serological reactions with typing sera, to explore the genetics of capsular diversity in this major pathogen. Here we present highlights of our analysis to date, and a more exhaustive analysis will be reported elsewhere. Results General Features of the dexB–aliA Locus from 90 Serotypes PCR products were generated from genomic DNA using primers specific for the dexB and aliA genes and ranged in size from 10,337 bp (serotype 3) to 30,298 bp (serotype 38) with an average of 20,714 bp. The synthase gene (wchE) of serotype 3 is located within the cps locus, but the type 37 cps locus, which was very similar to that of serotype 33F, is defective and serotype is determined by the type 37 synthase gene (tts) located elsewhere on the chromosome [10]. Annotation and analysis of the cps sequences revealed the generality of several previously observed characteristics. Genes for the generation of CPSs are always orientated in the same direction as the dexB and aliA genes (Figures 2 and S1). The regulatory and processing genes wzg, wzh, wzd, and wze (also known as cpsABCD) are conserved with high sequence identity in all cases and are almost always in this gene order at the 5′ end of the cps locus. In most cps clusters, the fifth gene encodes the initial glucose phosphate transferase, WchA (also known as CpsE), responsible for linkage of an activated glucose phosphate to the lipid carrier (see below). The polysaccharide polymerase (wzy) and flippase (wzx) genes are always present downstream together with a varying set of genes for glycosyl transferases, acetyl transferases, nucleotide diphosphate sugar biosynthesis, and modifying enzymes. In every case, there is a region of low percentage G+C content within the cps locus. The first four genes and the non-housekeeping sugar biosynthesis genes have typical percentage G+C content for S. pneumoniae, while the “serotype-specific” genes, particularly wzy and wzx, tend to have more AT-rich sequences. In the regions between the cps genes and the flanking dexB and aliA genes, there is almost always evidence of mobile genetic elements. This is largely manifested as intact or disrupted genes for insertion-sequence (IS) transposases [19,20], although in four cases we identified group-II introns [21] (serotypes 19F, 25F, 25A, and 38). We could assign a functional designation to the products of all but 26 of the 1,999 predicted coding sequences in the 90 cps regions, with most of the remainder showing weak similarities to products of genes in bacterial polysaccharide gene clusters. Unsurprisingly, many coding sequences fall into the broad functional categories of glycosyl transferase (351), acetyl transferase (74), and sugar phosphate transferase (71). To make more specific assignments within such categories, we used the TribeMCL program to assemble all the annotated proteins into homology groups (HGs). With from two to 90 members in each, 91% of the proteins assembled into 175 HGs, with the remainder forming 74 single-member HGs (Table S1). The products of wzg, wzh, wzd, and wze each fall into a single HG covering every serotype. Ignoring IS element transposases, the next largest HG comprises 65 WchA initial transferases (HG5). At the other extreme, the serotype-specific gene products are diverse, with 87 HGs for non-initial sugar transferases and 40 and 13 groups of Wzy repeat-unit polymerases and Wzx flippases, respectively. Biosynthesis of Precursors for Sugars and Other CPS Components Of the 18 sugars and related compounds found in S. pneumoniae capsules, seven are available from housekeeping metabolic pathways and nine from known dedicated pathways encoded within the cps cluster (Figure S2). This includes 4-keto-N-acetyl-D-quinovosamine (UDP-KDQNAc), which is the intermediate in the two step reaction catalysed by FnlA [22]. We found a perfect correlation between the presence of a non-housekeeping sugar in the CPS and the presence of the appropriate biosynthetic genes in the cps locus. Two of the three remaining compounds are the sugar alcohol phosphates arabinitol-1-P and mannitol-6-P. The precursors for these have not been identified, but nucleotide-diphosphate-linked precursors can be easily derived from D-xylulose-5-phosphate or D-fructose-6-phosphate, respectively, by two-step pathways parallel to that for CDP-ribitol formation from ribulose-5-phosphate [23]. D-xylulose-5-phosphate and D-fructose-6-phosphate are central to major pathways, and there are appropriate genes for their conversion in the associated cps loci. The precursor for ribofuranose has also not been identified, but a proposed pathway for its biosynthesis by the product of a gene within CPS 19F (cps19R) [24] is supported by our observation that an orthologous gene (renamed rbsF) is present for all CPS that contain ribofuranose. Choline-1-phosphate, glycerol-1-phosphate, and glycerol-2-phosphate are also found in some of the structures. CDP-choline is known to be produced by S. pneumoniae as a precursor for teichoic acid biosynthesis [25]. For glycerol-1-phosphate, we find an intact gct gene for CDP-glycerol synthesis [26] in the cps where expected, and there are four genes associated with presence of glycerol-2-phosphate, three of which are thought to encode a CDP-2-glycerol pathway [27], while wchX encodes the glycerol phosphotransferase. The situation is illustrated in Figure 1 for cps9A, which has pathway genes for N-acetylmannosamine pyranose and glucuronic acid, but not for glucopyranose (Glcp) or galactopyranose (Galp) as these are available in S. pneumoniae from central metabolism. Initial Transferases and Polymerisation Initial transferase WchA adds glucose-1-phosphate to undecaprenol phosphate [28] to create Und-PP-Glc (Figure 1), and we assume it performs that function in all 65 serotypes where it is present. For the known structures, there is a perfect correlation between the presence/absence of wchA and the presence/absence of glucose in the repeating unit. Where wchA is absent, the products of the fifth cps gene fall into three HGs (WciI, WcjG, and WcjH) all with the same Pfam [29] domain and similar hydrophobicity profiles to the carboxy-terminal region of WchA. We suggest that they function as the initial sugar transferases, as it is known that for the Salmonella enterica wchA homologue, wbaP, the 3′ end of the gene is sufficient for transferase activity [30]. By correlation with CPS constituents, we predict the transferred initial sugars as N-acetylgalactosamine pyranose (GalpNAc) or N-acetylglucosamine pyranose (GlcpNAc) for WciI, Galp or galactofuranose for WcjG and Galp for WcjH. Serotype 1 is an exception as no gene product with similarity to an initial sugar transferase has yet been identified. The initial sugar of the repeat unit is also the donor sugar in the polymerisation of the repeat units (Figure 1), and the specificity of the Wzy polymerase determines the other component of this linkage, which in the case of CPS 9A is a beta (1–4) linkage to the terminal glucose of the next repeat unit. For the known structures [31], identification of the initial sugar allowed us to determine the polymerase linkage as both donor and acceptor sugar, and the linkages were defined once the initial sugar had been identified (see Figures 2 and S1). Where there is ambiguity due to two residues of the initial sugar in the repeat unit, the polymerase linkage can be provisionally identified by considering the linkage catalysed by other members of the same Wzy HG. The predictions for initial sugars, and subsequent repeat-unit polymerisation linkage, correlate well with the polymerase HGs (Table S2). There are 32 polymerase HGs associated with WchA, five with WciI, four with WcjG and one with WcjH. These associations are mostly exclusive, with only five polymerase HGs associated with two initial transferases. In such cases, the linkages involve the same acceptor sugar anomerism (α or β isomer) and the same or a closely related donor sugar. This adds strong support to the inferences drawn for the specificity of the initial transferases. Relating cps Genes with CPS Structure and Serological Profile The availability of all of the annotated cps sequences allowed us to look for correlations between genes, known CPS structures, and serology (gene clusters, CPS structures, and antigenic formulae are summarised in Figure S1 and Table S3). In this way, we can attempt both to infer gene function and, by comparing related cps loci, to account for differences in CPS structure and serology. Variations between cps loci range from two base substitutions for 18B and 18C to wholesale differences in gene complement. Within this range, the variations likely to have a phenotypic effect include gene inactivation due to single base substitutions generating a premature stop codon, single base insertion/deletions leading to translational frameshifts, change of sequence leading to change of enzyme specificity, recombination or IS element insertion leading to gene truncation, and insertion/deletion/replacement of single and multiple genes. Within serogroups, the genetic differences were often subtle but were also sometimes surprisingly prominent. Comparisons also revealed some strong commonality between the cps of different serogroups and serotypes. Illustrative examples that demonstrate how structure, genetics, and serology were combined to analyse the cps loci are shown in Figure 2 and are discussed below. Serogroup 9 Previously described CPS structures [31] for all four serotypes of serogroup 9 show only subtle differences and provide an example of multiple serotypes arising by divergence from a single cps locus. Their cps genes fall into two pairs, with 9A highly similar to 9V [32], and 9L highly similar to 9N, but with the two pairs differing significantly in sequence ( Figure 2), suggesting an initial divergence to form two ancestral serotypes; this split correlates with a difference at residue 5 of the repeat unit, where 9L and 9N CPSs have GlcpNAc, whereas 9A and 9V have Glcp. Factor sera 9d reacts with 9A and 9V but not with 9L and 9N, suggesting that it is interacting with Glcp but not with GlcpNAc. Both are housekeeping sugars, and their differential incorporation is likely to be due to divergent forms of glycosyl transferase WcjC. Subsequently, one of these ancestral serotypes diverged to form 9L and 9N, the latter becoming unique in the group in having Glcp rather than Galp as residue 3 in the repeat unit. Their dexB–aliA loci have the same gene complement, and within the cps genes there are only 79 nucleotide differences. The highest number of amino acid substitutions (13) is within glycosyl transferase WcjA; ten are unique to 9N and presumably result in its altered specificity for Glcp rather than for Galp. The other ancestral serotype gave rise to 9V and 9A, which differ from each other only in their CPS acetylation; the former CPS has an O-acetylation pattern unique in the serogroup. This is likely due to the O-acetyl transferase–encoding wcjE gene, which is intact and apparently functional in 9V, disrupted by a frameshift mutation in 9A (deletion of guanine, nucleotide 726), and truncated in 9L and 9N by the insertion of an IS element. Interestingly, factor sera 9g reacts only with serotype 9V and may recognise an acetyl-based epitope determined by wcjE. Serogroup 9 cps loci also differ by the insertion, in 9A and 9V relative to 9L and 9N, of an O-acetyl transferase gene (wcjD) and an adjacent IS element. This correlates with recent nuclear magnetic resonance data (I. C. Skovsted, unpublished data), indicating that 9A CPS is partially acetylated. Serotypes 44 and 46 Are Related to Serogroup 12 The cps gene clusters of serogroup 12 and serotypes 44 and 46 are almost identical, differing only in IS transposase genes, and provide an example of common ancestry that is not apparent from serology. Structures have been determined for serotypes 12F and 12A only, although the individual constituents for serotype-46 CPS are known and all are present in 12F and 12A [31]. Although no factor serum cross-reacts with all five serotypes, serological reactions do indicate antigenic commonalities [4]; 44 cross-reacts with factor sera 12b and 12d, while 46 cross-reacts with 12c. Given the cps similarities, the significant differences between 12F and 12A CPS are perhaps surprising; 12A has a GalpNAc and 12F has a Galp side branch, and the first main-chain residue is GalpNAc in 12F and GlcpNAc in 12A. The nucleotide differences are concentrated within two glycosyl transferase genes (wciI and wcxB), and we predict that the initial transferases, WciI–12A and WciI–12F, with 38 amino acid differences, link GlcpNAc and GalpNAc, respectively, to the lipid carrier, while WcxB–12A and WcxB–12F, with 17 amino acid differences, account for the side-branch difference. Serotype 14 Is Closely Related to Serogroup 15 Serotype 14 shares no significant serological cross-reaction with serogroup 15, or with any other serotype, but the cps loci of these two serotypes are clearly related. All CPS structures for serotype 14 and serogroup 15 are known [31,33,34], and comparisons of structures and genes allow inferences about one to be made from the other. The four serogroup 15 pentasaccharide repeat units are identical, but polymerisation forms a linear polymer in 15A and 15F and a branched structure in 15B and 15C that correlates with the presence of wzy genes of different HGs (see Table S2). Serotypes 15B and 15C differ in the presence or absence of O-acetylation [35] and, as previously described [36], the difference is due to a variable-length TA tandem repeat region at the 5′ end of wciZ—in frame in 15B and out of frame in 15C strains. This gene is in frame in 15F (acetylated), but extensively degraded, rather than simply out of frame, in 15A (not acetylated). Genes for synthesis of glycerol-2-phosphate (gtp1, gtp2, and gtp3) are present in all serogroup 15 cps loci, but glycerol was reported to be present only in 15A, being replaced by choline-P in 15F, 15B, and 15C, with either residue being present on only a proportion of the repeat units [31]. In all cases, the transferase is presumed to be encoded by wchX, with the molecular basis of the structural polymorphism being contentious. However, recent nuclear magnetic resonance analysis indicates that 15B contains glycerol and not choline, suggesting that the same may also be true for 15F and 15C [34]. The 3′ end of 15F cps has four extra genes—rmlB, rmlD, glf, and a putative acetyl transferase gene wcjE—but they appear to have no effect on the structure as there is no rhamnose, galactofuranose, or extra acetylation in 15F CPS. Indeed, rmlA and rmlC would also be required for rhamnose biosynthesis. These four genes show synteny with the 3′ end of cps in several serotypes, particularly serotype 31, and their arrangement in 15F may indicate a recombination event. The serotype 14 [28,37,38] and basic 15 cps gene clusters clearly share common ancestry and differ only at the 3′ end, where the glycerol-2-phosphate–related genes in 15 are replaced in 14 by a gene (lrp) encoding a large (1,359 amino acid) repetitive protein, which correlates well with CPS structures [36]. The type-14 repeat unit most resembles the branched form of 15B and 15C, with the lack of O-acetylation due to the absence of wciZ. The lack of α-D-galactose is probably due to degradation of the relevant transferase gene, wchN. The large repetitive protein encoded by serotype-14 cps has a hydrophobic C-terminal region, suggesting that it may be anchored to the cell surface. This leads us to speculate that Lrp may serve as a dominant antigen that overwhelms the serological similarities to serogroup 15 that should be evident from their very similar repeat units. Discussion Several bacterial pathogens exist as a large number of antigenic variants because of differences in the polysaccharides presented at the cell surface. However, the sequencing and analysis of the cps loci of pneumococci described here are believed to provide the only such case where the whole gene repertoire is available, allowing genetics, chemistry, and immunology to be combined to predict the role of cps genes. This combined approach has allowed the confident prediction of most gene functions, but it has also highlighted the limitations where subtle sequence changes may alter enzyme substrate specificity. Analysis of the cps loci indicates that a number of different mechanisms have generated antigenic diversity in CPSs. Some of these involve the divergence of a single serotype into two related serotypes by the accumulation of point mutations (e.g., serogroup 6 [39]), or the insertion or deletion of a single gene, resulting in slightly different CPS structures (e.g., serogroup 18). In other cases, the cps loci of some serotypes within a serogroup seem to be virtually unrelated and probably reflect the sharing of a dominant epitope that led to them being placed within the same serogroup (e.g., serogroups 7, 17, 33, and 35). Similarly, some serotypes placed in different serogroups show more relatedness among their cps loci than those within the same serogroup (e.g., types 7B and 7C are more closely related to type 40 than to 7A and 7F). This is perhaps not surprising as serogroups were defined by common epitopes in the absence of any knowledge of the CPS structures or the cps sequences that code for their synthesis. Shared immunodominant epitopes will lead to inclusion in the same serogroup even if there are major differences in other parts of the structure and hence in the cps. A striking feature of the cps loci is the presence of many highly divergent forms of each of the key enzyme classes. Thus, there are 40 HGs for polysaccharide polymerases, 13 groups of flippases, and a great diversity of transferases. The presence of multiple non-homologous or highly divergent forms of these enzymes, together with the low percentage G+C content of the region in which these are encoded, supports the view that these genes have been imported into pneumococci (or their ancestors) on multiple occasions from different and unknown sources. The plethora of transferases in the pneumococcal cps loci provides an opportunity to continually generate new serotypes by gene shuffling, but there are no clear examples of serotypes arising as mosaics of two existing cps loci. One barrier to the frequent appearance of new serotypes by recombination is a lack of homology between the serotype-specific regions of cps loci of different serogroups. The appearance of new serotypes may also be limited by a need to change multiple cps genes; rare genetic events that create mosaics between existing cps loci probably typically fail to produce a capsule since new repeat units resulting from the capture of novel transferases are unlikely to be recognised as substrates by the resident repeat-unit polymerase. The cps sequences, and their associated polysaccharide structures and serological profiles, constitute an extensive dataset that, through further detailed analysis, will allow a clearer understanding of capsule biochemistry, genetics, and evolution and will precipitate advances in molecular serotyping of pneumococci [40,41]. Materials and Methods Strain selection, serotyping, and genomic DNA isolation. Representative strains of the 90 S. pneumoniae serotypes were selected from among the lyophilised strains in the strain collection of the World Health Organization Collaborating Centre for Reference and Research on Pneumococci, Statens Serum Institut (Copenhagen, Denmark) (Table S4). The strains were serotyped and cultured, and genomic DNA was extracted by standard methods [3,4,42]. PCR and DNA sequencing. PCR reactions were performed using the Expand Long Template PCR System (Roche, Basel, Switzerland), which contains proof-reading thermostable polymerases. Initial reactions used primers CPS1 (TTGCCAATGAAGAGCAAGACTTGACAGTAG) and CPS2 (CAATAATGTCACGCCCGCAAGGGCAAGT) [26]. Where these failed to produce an adequate product, further reactions were attempted using alternative dexB-specific primers (CPS1A [CGACCGTCGCTTCCTAGTTGTGGCTAAC] or PCPS3f [CACACAGAAAGCATCCCATGG]) and aliA-specific primers (CPS1B [GTCTTGAGCTTTGACTGCCGCGTATTCT] or PCPS3r [GAGACAGACCTGATAACCTCAACTATTTG]). The cps cluster for our serotype-5 strain was amplified using a primer based on the EMBL file (AY336008) specific for the wzg gene (CPS05F [CGTTCACAGAAAGTGAAGCG]) in combination with PCPS3r. PCR products spanning the cps locus were used directly to construct small-insert libraries [43], with 1- to 2-kb inserts in pUC18. Clones from each library were sequenced from each end using Big-Dye terminator chemistry (Applied Biosystems, Foster City, California, United States) on ABI3730 sequencing machines, to give an average of 8- to 10-fold coverage of each product. These reads were assembled with Phrap (CodonCode, Dedham, Massachusetts, United States), and any gaps or regions of poor coverage were re-sequenced using primer-directed sequencing directly from the original PCR product using Big-Dye primer chemistry (Applied Biosystems). This sequencing procedure should prevent any PCR errors from being represented in the final consensus sequence. Annotation and bioinformatic methods. Gene prediction and annotation were performed as previously described [44]. Predicted proteins were clustered into homology groups using TribeMCL (Centre for Mathematics and Computer Science and EMBL-EBI) [45] with a cut-off of 1e −50. The genes within the cps loci that encoded proteins within the same homology group were assigned the same name, the exceptions being the polymerases and flippases where we used the prior gene nomenclature, wzy and wzx, even though in both cases there were multiple homology groups. Alignment of gene clusters was performed using the Artemis Comparison Tool (Sanger Institute, Hinxton, United Kingdom). Nucleotide differences were identified using the EMBOSS program Diffseq (MRC Rosalind Franklin Centre for Genomics Research, Hinxton, United Kingdom) [46]. Supporting Information Figure S1 Capsule Biosynthesis Genes and Repeat-Unit Polysaccharide Structure for All 90 Serotypes (9.9 MB TIF) Click here for additional data file. Figure S2 Biosynthesis Pathways for Non-Housekeeping Sugars (50 KB PPT) Click here for additional data file. Table S1 Homology Groups including Numbers of Members and Product Description Proteins in different homology groups are so divergent that they are highly unlikely to have diverged from a common streptococcal ancestor. (306 KB DOC) Click here for additional data file. Table S2 Associations between Initial Transferases and Wzy Polymerase Groups Proposed Wzy groupings represent a sequential numbering of homology groups and are represented on structural diagrams. (104 KB DOC) Click here for additional data file. Table S3 Type Designations and Antigenic Formulae for the 90 Serotypes of S. pneumoniae The antigenic formulae represent arbitrary designations of cross-reactions as seen by the capsular reaction. (72 KB DOC) Click here for additional data file. Table S4 Type and Strain Designations for the 90 Strains of S. pneumoniae Analysed (68 KB DOC) Click here for additional data file. Accession Numbers The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl), GenBank (http://www.ncbi.nlm.nih.gov/Genbank), and DNA Data Bank of Japan (http://www.ddbj.nig.ac.jp/Welcome-e-html) accession numbers for the sequences reported in this paper for the capsular biosynthetic genes of the 90 serotypes of S. pneumoniae are CR931632–CR931722. The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl) accession number for the wzg gene is AY336008. The Pfam domain (http://www.sanger.ac.uk/cgi-bin/Pfam) for WchA, WciI, WcjG, and WcjH is PF02397.
                Bookmark

                Author and article information

                Journal
                mBio
                mBio
                American Society for Microbiology
                2150-7511
                July 01 2014
                August 29 2014
                July 22 2014
                July 22 2014
                : 5
                : 4
                Article
                10.1128/mBio.01490-14
                8bf607e4-f4a0-4fe4-8e39-e9b51a356b8d
                © 2014
                History

                Comments

                Comment on this article