The Helicobacter pylori cag pathogenicity island ( cagPAI) encodes a type IV secretion system. Humans infected with cagPAI–carrying H. pylori are at increased risk for sequelae such as gastric cancer. Housekeeping genes in H. pylori show considerable genetic diversity; but the diversity of virulence factors such as the cagPAI, which transports the bacterial oncogene CagA into host cells, has not been systematically investigated. Here we compared the complete cagPAI sequences for 38 representative isolates from all known H. pylori biogeographic populations. Their gene content and gene order were highly conserved. The phylogeny of most cagPAI genes was similar to that of housekeeping genes, indicating that the cagPAI was probably acquired only once by H. pylori, and its genetic diversity reflects the isolation by distance that has shaped this bacterial species since modern humans migrated out of Africa. Most isolates induced IL-8 release in gastric epithelial cells, indicating that the function of the Cag secretion system has been conserved despite some genetic rearrangements. More than one third of cagPAI genes, in particular those encoding cell-surface exposed proteins, showed signatures of diversifying (Darwinian) selection at more than 5% of codons. Several unknown gene products predicted to be under Darwinian selection are also likely to be secreted proteins (e.g. HP0522, HP0535). One of these, HP0535, is predicted to code for either a new secreted candidate effector protein or a protein which interacts with CagA because it contains two genetic lineages, similar to cagA. Our study provides a resource that can guide future research on the biological roles and host interactions of cagPAI proteins, including several whose function is still unknown.
Most humans are infected with Helicobacter pylori. The H. pylori cag pathogenicity island ( cagPAI) encodes a secretion apparatus that can translocate the CagA protein into host cells. Humans infected with cagPAI–carrying H. pylori are at increased risk of severe disease, including gastric cancer. We analyzed the nucleotide sequences and functional diversity of the cagPAI in a globally representative collection of isolates. Complete cagPAI sequences were obtained for 29 strains from all known H. pylori biogeographic populations. The gene content and arrangement of the cagPAI and its function were highly conserved. Diversity in most cag genes consisted in large part of synonymous polymorphisms. However some genes—in particular those that encode proteins predicted to be secreted or located on the outside of the bacterial cell—had particularly high frequencies of non-synonymous polymorphisms, suggesting that they were under diversifying selection. Our study provides evidence that the cagPAI was only acquired once and provides an important resource that can guide future research on the biological roles and host interactions of cagPAI proteins, including several whose function is still unknown.