A novel method for accurate operon predictions in all sequenced prokaryotes

Price, Morgan N.; Huang, Katherine H.; Alm, Eric J; Arkin, Adam P

doi:10.1093/nar/gki232

ScienceOpen: research and publishing network

For Publishers

For Researchers

Blog
About

Search
Advanced search

views

recommends

Record: found
Abstract: found
Article: not found

A novel method for accurate operon predictions in all sequenced prokaryotes

research-article

Author(s): Morgan N. Price ¹ , Katherine H. Huang ¹ , Eric J. Alm ¹ ^, ^* , Adam P. Arkin ¹ ^, ² ^, ³

Publication date (Electronic): 8 February 2005

Journal: Nucleic Acids Research

Publisher: Oxford University Press

Read this article at

ScienceOpenPublisher PMC

Bookmark

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

We combine comparative genomic measures and the distance separating adjacent genes to predict operons in 124 completely sequenced prokaryotic genomes. Our method automatically tailors itself to each genome using sequence information alone, and thus can be applied to any prokaryote. For Escherichia coli K12 and Bacillus subtilis, our method is 85 and 83% accurate, respectively, which is similar to the accuracy of methods that use the same features but are trained on experimentally characterized transcripts. In Halobacterium NRC-1 and in Helicobacter pylori, our method correctly infers that genes in operons are separated by shorter distances than they are in E.coli, and its predictions using distance alone are more accurate than distance-only predictions trained on a database of E.coli transcripts. We use microarray data from six phylogenetically diverse prokaryotes to show that combining intergenic distance with comparative genomic measures further improves accuracy and that our method is broadly effective. Finally, we survey operon structure across 124 genomes, and find several surprises: H.pylori has many operons, contrary to previous reports; Bacillus anthracis has an unusual number of pseudogenes within conserved operons; and Synechocystis PCC 6803 has many operons even though it has unusually wide spacings between conserved adjacent genes.

Related collections

Most cited references 26

Record: found
Abstract: found
Article: not found

The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications.

P. Sharp, W Li (1987)

A simple, effective measure of synonymous codon usage bias, the Codon Adaptation Index, is detailed. The index uses a reference set of highly expressed genes from a species to assess the relative merits of each codon, and a score for a gene is calculated from the frequency of use of all codons in that gene. The index assesses the extent to which selection has been effective in moulding the pattern of codon usage. In that respect it is useful for predicting the level of expression of a gene, for assessing the adaptation of viral genes to their hosts, and for making comparisons of codon usage in different organisms. The index may also give an approximate indication of the likely success of heterologous gene expression.

0 comments Cited 731 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The genome sequence of the food-borne pathogen Campylobacter jejuni reveals hypervariable sequences.

J. Parkhill, B W Wren, K Mungall … (2000)

Campylobacter jejuni, from the delta-epsilon group of proteobacteria, is a microaerophilic, Gram-negative, flagellate, spiral bacterium-properties it shares with the related gastric pathogen Helicobacter pylori. It is the leading cause of bacterial food-borne diarrhoeal disease throughout the world. In addition, infection with C. jejuni is the most frequent antecedent to a form of neuromuscular paralysis known as Guillain-Barré syndrome. Here we report the genome sequence of C. jejuni NCTC11168. C. jejuni has a circular chromosome of 1,641,481 base pairs (30.6% G+C) which is predicted to encode 1,654 proteins and 54 stable RNA species. The genome is unusual in that there are virtually no insertion sequences or phage-associated sequences and very few repeat sequences. One of the most striking findings in the genome was the presence of hypervariable sequences. These short homopolymeric runs of nucleotides were commonly found in genes encoding the biosynthesis or modification of surface structures, or in closely linked genes of unknown function. The apparently high rate of variation of these homopolymeric tracts may be important in the survival strategy of C. jejuni.

0 comments Cited 473 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Assigning protein functions by comparative genome analysis: protein phylogenetic profiles.

M Pellegrini, E Marcotte, M. Thompson … (1999)

Determining protein functions from genomic sequences is a central goal of bioinformatics. We present a method based on the assumption that proteins that function together in a pathway or structural complex are likely to evolve in a correlated fashion. During evolution, all such functionally linked proteins tend to be either preserved or eliminated in a new species. We describe this property of correlated evolution by characterizing each protein by its phylogenetic profile, a string that encodes the presence or absence of a protein in every known genome. We show that proteins having matching or similar profiles strongly tend to be functionally linked. This method of phylogenetic profiling allows us to predict the function of uncharacterized proteins.

0 comments Cited 393 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Nucleic Acids Res

Journal ID (publisher-id): Nucleic Acids Research

Title: Nucleic Acids Research

Publisher: Oxford University Press

ISSN (Print): 0305-1048

ISSN (Electronic): 1362-4962

Publication date Collection: 2005

Publication date (Print): 2005

Publication date (Electronic): 8 February 2005

Volume: 33

Issue: 3

Pages: 880-892

Affiliations

¹Lawrence Berkeley National Lab 1 Cyclotron Road, Mailstop 939R704, Berkeley, CA 94720, USA

²Howard Hughes Medical Institute Berkeley, CA, USA

³Department of Bioengineering, University of California Berkeley, USA

Author notes

^*To whom correspondence should be addressed. Tel: +1 510 843 1794; Fax: +1 510 486 6059; Email: ejalm@ 123456lbl.gov

Article

DOI: 10.1093/nar/gki232

PMC ID: 549399

PubMed ID: 15701760

SO-VID: 5d95853c-7291-4108-baf5-d1afac64e5c5

License:

The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@ 123456oupjournals.org

History

Date received : 14 September 2004

Date revision received : 25 October 2004

Date accepted : 20 January 2005

Comments

Comment on this article

scite_

Cited by 134

See all cited by

Most referenced authors 751

See all reference authors

A novel method for accurate operon predictions in all sequenced prokaryotes

Read this article at

Abstract

Related collections

Genes & Diseases

Most cited references 26

The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications.

The genome sequence of the food-borne pathogen Campylobacter jejuni reveals hypervariable sequences.

Assigning protein functions by comparative genome analysis: protein phylogenetic profiles.

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 293

Cited by 134

Most referenced authors 751