A Semi-Quantitative, Synteny-Based Method to Improve Functional Predictions for Hypothetical and Poorly Annotated Bacterial and Archaeal Genes

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

During microbial evolution, genome rearrangement increases with increasing sequence divergence. If the relationship between synteny and sequence divergence can be modeled, gene clusters in genomes of distantly related organisms exhibiting anomalous synteny can be identified and used to infer functional conservation. We applied the phylogenetic pairwise comparison method to establish and model a strong correlation between synteny and sequence divergence in all 634 available Archaeal and Bacterial genomes from the NCBI database and four newly assembled genomes of uncultivated Archaea from an acid mine drainage (AMD) community. In parallel, we established and modeled the trend between synteny and functional relatedness in the 118 genomes available in the STRING database. By combining these models, we developed a gene functional annotation method that weights evolutionary distance to estimate the probability of functional associations of syntenous proteins between genome pairs. The method was applied to the hypothetical proteins and poorly annotated genes in newly assembled acid mine drainage Archaeal genomes to add or improve gene annotations. This is the first method to assign possible functions to poorly annotated genes through quantification of the probability of gene functional relationships based on synteny at a significant evolutionary distance, and has the potential for broad application.

Author Summary

Based on trends between gene sequence divergence and gene order divergence over time, we developed a new synteny-based method to refine functional annotation. This method uses these trends to determine the probability that any two syntenous genes (genes that are sequential in two organisms) are functionally related. Organisms that are distant relatives have few syntenous genes, but these syntenous genes have a very high probability of functional relatedness. We applied this method to newly assembled genomes of co-occurring, uncultivated acid mine drainage Archaea in order to improve their gene annotations. This application revealed important physiological differences between the co-occurring organisms in this clade, including the ability of some but not all of the Archaea to manufacture vitamin B12 and to carry out anaerobic energy metabolism. We also used this method to identify new genes possibly involved in vitamin B12 synthesis, ether lipid synthesis, molybdopterin synthesis and utilization, and microbial immunity through the CRISPR system.

Related collections

Most cited references 25

Record: found
Abstract: found
Article: not found

Community structure and metabolism through reconstruction of microbial genomes from the environment.

Gene W. Tyson, Jarrod Chapman, Philip Hugenholtz … (2004)

Microbial communities are vital in the functioning of all ecosystems; however, most microorganisms are uncultivated, and their roles in natural systems are unclear. Here, using random shotgun sequencing of DNA from a natural acidophilic biofilm, we report reconstruction of near-complete genomes of Leptospirillum group II and Ferroplasma type II, and partial recovery of three other genomes. This was possible because the biofilm was dominated by a small number of species populations and the frequency of genomic rearrangements and gene insertions or deletions was relatively low. Because each sequence read came from a different individual, we could determine that single-nucleotide polymorphisms are the predominant form of heterogeneity at the strain level. The Leptospirillum group II genome had remarkably few nucleotide polymorphisms, despite the existence of low-abundance variants. The Ferroplasma type II genome seems to be a composite from three ancestral strains that have undergone homologous recombination to form a large population of mosaic genomes. Analysis of the gene complement for each organism revealed the pathways for carbon and nitrogen fixation and energy generation, and provided insights into survival strategies in an extreme environment.

0 comments Cited 650 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The use of gene clusters to infer functional coupling.

R Overbeek, M Fonstein, M. D'Souza … (1999)

Previously, we presented evidence that it is possible to predict functional coupling between genes based on conservation of gene clusters between genomes. With the rapid increase in the availability of prokaryotic sequence data, it has become possible to verify and apply the technique. In this paper, we extend our characterization of the parameters that determine the utility of the approach, and we generalize the approach in a way that supports detection of common classes of functionally coupled genes (e.g., transport and signal transduction clusters). Now that the analysis includes over 30 complete or nearly complete genomes, it has become clear that this approach will play a significant role in supporting efforts to assign functionality to the remaining uncharacterized genes in sequenced genomes.

0 comments Cited 305 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Predicting protein function by genomic context: quantitative evaluation and qualitative inferences.

Peer Bork, M Huynen, Leo Snel … (2000)

Various new methods have been proposed to predict functional interactions between proteins based on the genomic context of their genes. The types of genomic context that they use are Type I: the fusion of genes; Type II: the conservation of gene-order or co-occurrence of genes in potential operons; and Type III: the co-occurrence of genes across genomes (phylogenetic profiles). Here we compare these types for their coverage, their correlations with various types of functional interaction, and their overlap with homology-based function assignment. We apply the methods to Mycoplasma genitalium, the standard benchmarking genome in computational and experimental genomics. Quantitatively, conservation of gene order is the technique with the highest coverage, applying to 37% of the genes. By combining gene order conservation with gene fusion (6%), the co-occurrence of genes in operons in absence of gene order conservation (8%), and the co-occurrence of genes across genomes (11%), significant context information can be obtained for 50% of the genes (the categories overlap). Qualitatively, we observe that the functional interactions between genes are stronger as the requirements for physical neighborhood on the genome are more stringent, while the fraction of potential false positives decreases. Moreover, only in cases in which gene order is conserved in a substantial fraction of the genomes, in this case six out of twenty-five, does a single type of functional interaction (physical interaction) clearly dominate (>80%). In other cases, complementary function information from homology searches, which is available for most of the genes with significant genomic context, is essential to predict the type of interaction. Using a combination of genomic context and homology searches, new functional features can be predicted for 10% of M. genitalium genes.

0 comments Cited 157 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

: Role: Editor

Journal

Journal ID (nlm-ta): PLoS Comput Biol

Journal ID (publisher-id): plos

Journal ID (pmc): ploscomp

Title: PLoS Computational Biology

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Print): 1553-734X

ISSN (Electronic): 1553-7358

Publication date Collection: October 2011

Publication date (Print): October 2011

Publication date (Electronic): 20 October 2011

Volume: 7

Issue: 10

Electronic Location Identifier: e1002230

Affiliations

[1 ]Department of Environmental Science, Policy, and Management, University of California, Berkeley, California, United States of America

[2 ]Department of Earth and Planetary Sciences, University of California, Berkeley, California, United States of America

[3 ]Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California, United States of America

[4 ]Department of Plant and Microbial Biology, University of California, Berkeley, California, United States of America

University of California Davis, United States of America

Author notes

* E-mail: jbanfield@ 123456berkeley.edu

¤a: Current address: Josephine Bay Paul Center for Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, Massachusetts, United States of America.

¤b: Current address: Department of Environment and Agro-Biotechnologies, Centre de Recherche Public – Gabriel Lippmann, Belvaux, Grand-Duchy of Luxembourg.

Conceived and designed the experiments: APY JFB. Performed the experiments: APY BCT SLS PW AZ MPT NJ. Analyzed the data: APY. Contributed reagents/materials/analysis tools: BCT. Wrote the paper: APY BCT JFB.

Article

Publisher ID: PCOMPBIOL-D-11-00418

DOI: 10.1371/journal.pcbi.1002230

PMC ID: 3197636

PubMed ID: 22028637

SO-VID: 1a4bf0fe-f4e4-4add-9ee5-c9f0bd7aa271

Copyright © Yelton et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 29 March 2011

Date accepted : 30 August 2011

Page count

Pages: 12

Comments

Comment on this article

scite_

Cited by 17

See all cited by

A Semi-Quantitative, Synteny-Based Method to Improve Functional Predictions for Hypothetical and Poorly Annotated Bacterial and Archaeal Genes

Read this article at

Abstract

Author Summary

Related collections

Journal of Systems Thinking

Most cited references 25

Community structure and metabolism through reconstruction of microbial genomes from the environment.

The use of gene clusters to infer functional coupling.

Predicting protein function by genomic context: quantitative evaluation and qualitative inferences.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 16

Cited by 17