+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Comparative genomic analysis of Helicobacter pylori from Malaysia identifies three distinct lineages suggestive of differential evolution

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          The discordant prevalence of Helicobacter pylori and its related diseases, for a long time, fostered certain enigmatic situations observed in the countries of the southern world. Variation in H. pylori infection rates and disease outcomes among different populations in multi-ethnic Malaysia provides a unique opportunity to understand dynamics of host–pathogen interaction and genome evolution. In this study, we extensively analyzed and compared genomes of 27 Malaysian H. pylori isolates and identified three major phylogeographic lineages: hspEastAsia, hpEurope and hpSouthIndia. The analysis of the virulence genes within the core genome, however, revealed a comparable pathogenic potential of the strains. In addition, we identified four genes limited to strains of East-Asian lineage. Our analyses identified a few strain-specific genes encoding restriction modification systems and outlined 311 core genes possibly under differential evolutionary constraints, among the strains representing different ethnic groups. The cagA and vacA genes also showed variations in accordance with the host genetic background of the strains. Moreover, restriction modification genes were found to be significantly enriched in East-Asian strains. An understanding of these variations in the genome content would provide significant insights into various adaptive and host modulation strategies harnessed by H. pylori to effectively persist in a host-specific manner.

          Related collections

          Most cited references 48

          • Record: found
          • Abstract: not found
          • Article: not found

          GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.

           J Besemer (2001)
          Improving the accuracy of prediction of gene starts is one of a few remaining open problems in computer prediction of prokaryotic genes. Its difficulty is caused by the absence of relatively strong sequence patterns identifying true translation initiation sites. In the current paper we show that the accuracy of gene start prediction can be improved by combining models of protein-coding and non-coding regions and models of regulatory sites near gene start within an iterative Hidden Markov model based algorithm. The new gene prediction method, called GeneMarkS, utilizes a non-supervised training procedure and can be used for a newly sequenced prokaryotic genome with no prior knowledge of any protein or rRNA genes. The GeneMarkS implementation uses an improved version of the gene finding program GeneMark.hmm, heuristic Markov models of coding and non-coding regions and the Gibbs sampling multiple alignment program. GeneMarkS predicted precisely 83.2% of the translation starts of GenBank annotated Bacillus subtilis genes and 94.4% of translation starts in an experimentally validated set of Escherichia coli genes. We have also observed that GeneMarkS detects prokaryotic genes, in terms of identifying open reading frames containing real genes, with an accuracy matching the level of the best currently used gene detection methods. Accurate translation start prediction, in addition to the refinement of protein sequence N-terminal data, provides the benefit of precise positioning of the sequence region situated upstream to a gene start. Therefore, sequence motifs related to transcription and translation regulatory sites can be revealed and analyzed with higher precision. These motifs were shown to possess a significant variability, the functional and evolutionary connections of which are discussed.
            • Record: found
            • Abstract: found
            • Article: not found

            SplitsTree: analyzing and visualizing evolutionary data.

             D Huson,  Daniel Huson (1997)
            Real evolutionary data often contain a number of different and sometimes conflicting phylogenetic signals, and thus do not always clearly support a unique tree. To address this problem, Bandelt and Dress (Adv. Math., 92, 47-05, 1992) developed the method of split decomposition. For ideal data, this method gives rise to a tree, whereas less ideal data are represented by a tree-like network that may indicate evidence for different and conflicting phylogenies. SplitsTree is an interactive program, for analyzing and visualizing evolutionary data, that implements this approach. It also supports a number of distances transformations, the computation of parsimony splits, spectral analysis and bootstrapping.
              • Record: found
              • Abstract: found
              • Article: not found

              Microbial gene identification using interpolated Markov models.

              This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae , Helicobacter pylori and other complete microbial genomes, this system has proven to be very accurate at locating virtually all the genes in these sequences, outperforming previous methods. A conservative estimate based on experiments on H.pylori and H. influenzae is that the system finds >97% of all genes. GLIMMER uses interpolated Markov models (IMMs) as a framework for capturing dependencies between nearby nucleotides in a DNA sequence. An IMM-based method makes predictions based on a variable context; i.e., a variable-length oligomer in a DNA sequence. The context used by GLIMMER changes depending on the local composition of the sequence. As a result, GLIMMER is more flexible and more powerful than fixed-order Markov methods, which have previously been the primary content-based technique for finding genes in microbial DNA.

                Author and article information

                Nucleic Acids Res
                Nucleic Acids Res
                Nucleic Acids Research
                Oxford University Press
                09 January 2015
                01 December 2014
                01 December 2014
                : 43
                : 1
                : 324-335
                [1 ]Pathogen Biology Laboratory, Department of Biotechnology and Bioinformatics, University of Hyderabad, Gachibowli, Hyderabad, 500046, India
                [2 ]Department of Medical Microbiology, Faculty of Medicine, University of Malaya, 50603, Kuala Lumpur, Malaysia
                [3 ]Department of Medicine, Faculty of Medicine, University of Malaya, 50603, Kuala Lumpur, Malaysia
                [4 ]School of Pathology and Laboratory Medicine, University of Western Australia, Nedlands 6009, Western Australia, Australia
                [5 ]Kusuma School of Biological Sciences, Indian Institute of Technology, Hauz Khas, New Delhi, 110016, India
                [6 ]Institute of Biological Sciences, University of Malaya, 50603, Kuala Lumpur, Malaysia
                Author notes
                [* ]To whom correspondence should be addressed. Tel: +91 40 23134585; Fax: +91 40 23134585; Email: niyaz.ahmed@ ; ahmed.nizi@
                © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@

                Page count
                Pages: 12
                Custom metadata
                January 2015



                Comment on this article