+1 Recommend
1 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A Global Overview of the Genetic and Functional Diversity in the Helicobacter pylori cag Pathogenicity Island

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          The Helicobacter pylori cag pathogenicity island ( cagPAI) encodes a type IV secretion system. Humans infected with cagPAI–carrying H. pylori are at increased risk for sequelae such as gastric cancer. Housekeeping genes in H. pylori show considerable genetic diversity; but the diversity of virulence factors such as the cagPAI, which transports the bacterial oncogene CagA into host cells, has not been systematically investigated. Here we compared the complete cagPAI sequences for 38 representative isolates from all known H. pylori biogeographic populations. Their gene content and gene order were highly conserved. The phylogeny of most cagPAI genes was similar to that of housekeeping genes, indicating that the cagPAI was probably acquired only once by H. pylori, and its genetic diversity reflects the isolation by distance that has shaped this bacterial species since modern humans migrated out of Africa. Most isolates induced IL-8 release in gastric epithelial cells, indicating that the function of the Cag secretion system has been conserved despite some genetic rearrangements. More than one third of cagPAI genes, in particular those encoding cell-surface exposed proteins, showed signatures of diversifying (Darwinian) selection at more than 5% of codons. Several unknown gene products predicted to be under Darwinian selection are also likely to be secreted proteins (e.g. HP0522, HP0535). One of these, HP0535, is predicted to code for either a new secreted candidate effector protein or a protein which interacts with CagA because it contains two genetic lineages, similar to cagA. Our study provides a resource that can guide future research on the biological roles and host interactions of cagPAI proteins, including several whose function is still unknown.

          Author Summary

          Most humans are infected with Helicobacter pylori. The H. pylori cag pathogenicity island ( cagPAI) encodes a secretion apparatus that can translocate the CagA protein into host cells. Humans infected with cagPAI–carrying H. pylori are at increased risk of severe disease, including gastric cancer. We analyzed the nucleotide sequences and functional diversity of the cagPAI in a globally representative collection of isolates. Complete cagPAI sequences were obtained for 29 strains from all known H. pylori biogeographic populations. The gene content and arrangement of the cagPAI and its function were highly conserved. Diversity in most cag genes consisted in large part of synonymous polymorphisms. However some genes—in particular those that encode proteins predicted to be secreted or located on the outside of the bacterial cell—had particularly high frequencies of non-synonymous polymorphisms, suggesting that they were under diversifying selection. Our study provides evidence that the cagPAI was only acquired once and provides an important resource that can guide future research on the biological roles and host interactions of cagPAI proteins, including several whose function is still unknown.

          Related collections

          Most cited references 62

          • Record: found
          • Abstract: found
          • Article: not found

          PAML 4: phylogenetic analysis by maximum likelihood.

           Ziheng Yang (2007)
          PAML, currently in version 4, is a package of programs for phylogenetic analyses of DNA and protein sequences using maximum likelihood (ML). The programs may be used to compare and test phylogenetic trees, but their main strengths lie in the rich repertoire of evolutionary models implemented, which can be used to estimate parameters in models of sequence evolution and to test interesting biological hypotheses. Uses of the programs include estimation of synonymous and nonsynonymous rates (d(N) and d(S)) between two protein-coding DNA sequences, inference of positive Darwinian selection through phylogenetic comparison of protein-coding genes, reconstruction of ancestral genes and proteins for molecular restoration studies of extinct life forms, combined analysis of heterogeneous data sets from multiple gene loci, and estimation of species divergence times incorporating uncertainties in fossil calibrations. This note discusses some of the major applications of the package, which includes example data sets to demonstrate their use. The package is written in ANSI C, and runs under Windows, Mac OSX, and UNIX systems. It is available at -- (
            • Record: found
            • Abstract: not found
            • Article: not found

            PAML: a program package for phylogenetic analysis by maximum likelihood.

             Q. Z. Yang (1997)
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Arlequin (version 3.0): An integrated software package for population genetics data analysis

              Arlequin ver 3.0 is a software package integrating several basic and advanced methods for population genetics data analysis, like the computation of standard genetic diversity indices, the estimation of allele and haplotype frequencies, tests of departure from linkage equilibrium, departure from selective neutrality and demographic equilibrium, estimation or parameters from past population expansions, and thorough analyses of population subdivision under the AMOVA framework. Arlequin 3 introduces a completely new graphical interface written in C++, a more robust semantic analysis of input files, and two new methods: a Bayesian estimation of gametic phase from multi-locus genotypes, and an estimation of the parameters of an instantaneous spatial expansion from DNA sequence polymorphism. Arlequin can handle several data types like DNA sequences, microsatellite data, or standard multi-locus genotypes. A Windows version of the software is freely available on

                Author and article information

                Role: Editor
                PLoS Genet
                PLoS Genetics
                Public Library of Science (San Francisco, USA )
                August 2010
                August 2010
                19 August 2010
                : 6
                : 8
                [1 ]Institute for Medical Microbiology and Hospital Epidemiology, Medizinische Hochschule Hannover, Hannover, Germany
                [2 ]Department of Molecular Biology, Max Planck Institute for Infection Biology, Berlin, Germany
                [3 ]Applied Maths, Sint-Martens-Latem, Belgium
                [4 ]Environmental Research Institute, University College Cork, Cork, Ireland
                [5 ]Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, United States of America
                Fred Hutchinson Cancer Research Center, United States of America
                Author notes

                Current address: Institute for Molecular Infection Biology, University of Würzburg, Würzburg, Germany


                Current address: Konrad Lorenz Institute for Ethology, Vienna, Austria

                Conceived and designed the experiments: PO CJ SS MA BL. Performed the experiments: PO CJ CS BL. Analyzed the data: PO CJ YM MU MA BL. Contributed reagents/materials/analysis tools: MV SS. Wrote the paper: CJ SS MA BL.

                Olbermann et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                Pages: 17
                Research Article
                Genetics and Genomics/Microbial Evolution and Genomics
                Infectious Diseases/Gastrointestinal Infections
                Microbiology/Cellular Microbiology and Pathogenesis



                Comment on this article