High accuracy operon prediction method based on STRING database scores

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

We present a simple and highly accurate computational method for operon prediction, based on intergenic distances and functional relationships between the protein products of contiguous genes, as defined by STRING database (Jensen,L.J., Kuhn,M., Stark,M., Chaffron,S., Creevey,C., Muller,J., Doerks,T., Julien,P., Roth,A., Simonovic,M. et al. (2009) STRING 8–a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res., 37, D412–D416). These two parameters were used to train a neural network on a subset of experimentally characterized Escherichia coli and Bacillus subtilis operons. Our predictive model was successfully tested on the set of experimentally defined operons in E. coli and B. subtilis, with accuracies of 94.6 and 93.3%, respectively. As far as we know, these are the highest accuracies ever obtained for predicting bacterial operons. Furthermore, in order to evaluate the predictable accuracy of our model when using an organism's data set for the training procedure, and a different organism's data set for testing, we repeated the E. coli operon prediction analysis using a neural network trained with B. subtilis data, and a B. subtilis analysis using a neural network trained with E. coli data. Even for these cases, the accuracies reached with our method were outstandingly high, 91.5 and 93%, respectively. These results show the potential use of our method for accurately predicting the operons of any other organism. Our operon predictions for fully-sequenced genomes are available at http://operons.ibt.unam.mx/OperonPredictor/.

Related collections

Most cited references 30

Record: found
Abstract: found
Article: not found

Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.

M Remm, C. E. V. Storm, E. Sonnhammer (2001)

Orthologs are genes in different species that originate from a single gene in the last common ancestor of these species. Such genes have often retained identical biological roles in the present-day organisms. It is hence important to identify orthologs for transferring functional information between genes in different organisms with a high degree of reliability. For example, orthologs of human proteins are often functionally characterized in model organisms. Unfortunately, orthology analysis between human and e.g. invertebrates is often complex because of large numbers of paralogs within protein families. Paralogs that predate the species split, which we call out-paralogs, can easily be confused with true orthologs. Paralogs that arose after the species split, which we call in-paralogs, however, are bona fide orthologs by definition. Orthologs and in-paralogs are typically detected with phylogenetic methods, but these are slow and difficult to automate. Automatic clustering methods based on two-way best genome-wide matches on the other hand, have so far not separated in-paralogs from out-paralogs effectively. We present a fully automatic method for finding orthologs and in-paralogs from two species. Ortholog clusters are seeded with a two-way best pairwise match, after which an algorithm for adding in-paralogs is applied. The method bypasses multiple alignments and phylogenetic trees, which can be slow and error-prone steps in classical ortholog detection. Still, it robustly detects complex orthologous relationships and assigns confidence values for both orthologs and in-paralogs. The program, called INPARANOID, was tested on all completely sequenced eukaryotic genomes. To assess the quality of INPARANOID results, ortholog clusters were generated from a dataset of worm and mammalian transmembrane proteins, and were compared to clusters derived by manual tree-based ortholog detection methods. This study led to the identification with a high degree of confidence of over a dozen novel worm-mammalian ortholog assignments that were previously undetected because of shortcomings of phylogenetic methods.A WWW server that allows searching for orthologs between human and several fully sequenced genomes is installed at http://www.cgb.ki.se/inparanoid/. This is the first comprehensive resource with orthologs of all fully sequenced eukaryotic genomes. Programs and tables of orthology assignments are available from the same location. Copyright 2001 Academic Press.

0 comments Cited 503 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation

Socorro Gama-Castro, Verónica Jiménez-Jacinto, Martin Peralta-Gil … (2008)

RegulonDB (http://regulondb.ccg.unam.mx/) is the primary reference database offering curated knowledge of the transcriptional regulatory network of Escherichia coli K12, currently the best-known electronically encoded database of the genetic regulatory network of any free-living organism. This paper summarizes the improvements, new biology and new features available in version 6.0. Curation of original literature is, from now on, up to date for every new release. All the objects are supported by their corresponding evidences, now classified as strong or weak. Transcription factors are classified by origin of their effectors and by gene ontology class. We have now computational predictions for σ54 and five different promoter types of the σ70 family, as well as their corresponding −10 and −35 boxes. In addition to those curated from the literature, we added about 300 experimentally mapped promoters coming from our own high-throughput mapping efforts. RegulonDB v.6.0 now expands beyond transcription initiation, including RNA regulatory elements, specifically riboswitches, attenuators and small RNAs, with their known associated targets. The data can be accessed through overviews of correlations about gene regulation. RegulonDB associated original literature, together with more than 4000 curation notes, can now be searched with the Textpresso text mining engine.

0 comments Cited 178 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A novel method for accurate operon predictions in all sequenced prokaryotes

Morgan N. Price, Katherine Huang, Eric Alm … (2005)

We combine comparative genomic measures and the distance separating adjacent genes to predict operons in 124 completely sequenced prokaryotic genomes. Our method automatically tailors itself to each genome using sequence information alone, and thus can be applied to any prokaryote. For Escherichia coli K12 and Bacillus subtilis, our method is 85 and 83% accurate, respectively, which is similar to the accuracy of methods that use the same features but are trained on experimentally characterized transcripts. In Halobacterium NRC-1 and in Helicobacter pylori, our method correctly infers that genes in operons are separated by shorter distances than they are in E.coli, and its predictions using distance alone are more accurate than distance-only predictions trained on a database of E.coli transcripts. We use microarray data from six phylogenetically diverse prokaryotes to show that combining intergenic distance with comparative genomic measures further improves accuracy and that our method is broadly effective. Finally, we survey operon structure across 124 genomes, and find several surprises: H.pylori has many operons, contrary to previous reports; Bacillus anthracis has an unusual number of pseudogenes within conserved operons; and Synechocystis PCC 6803 has many operons even though it has unusually wide spacings between conserved adjacent genes.

0 comments Cited 141 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Nucleic Acids Res

Journal ID (publisher-id): nar

Journal ID (hwp): nar

Title: Nucleic Acids Research

Publisher: Oxford University Press

ISSN (Print): 0305-1048

ISSN (Electronic): 1362-4962

Publication date Collection: July 2010

Publication date (Print): July 2010

Publication date (Electronic): 12 April 2010

Publication date PMC-release: 12 April 2010

Volume: 38

Issue: 12

Page: e130

Affiliations

¹Centro de Ciencias Aplicadas y Desarrollo Tecnológico, Universidad Nacional Autónoma de México, México, D.F., ²Instituto de Ingeniería, Ciudad Universitaria, Universidad Nacional Autónoma de México, México, D.F. and ³Departamento de Microbiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México

Author notes

*To whom correspondence should be addressed. Tel: +52 777 3291634; Fax: 52 777 3172388; Email: merino@ 123456ibt.unam.mx

Article

Publisher ID: gkq254

DOI: 10.1093/nar/gkq254

PMC ID: 2896540

PubMed ID: 20385580

SO-VID: ae79a248-89bb-4960-a5f1-4a18e743630d

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 11 January 2010

Date revision received : 2 March 2010

Date accepted : 28 March 2010

Comments

Comment on this article

scite_

Cited by 23

See all cited by

Most referenced authors 1,123

See all reference authors

High accuracy operon prediction method based on STRING database scores

Read this article at

Abstract

Related collections

Genome Engineering using CRISPR

Most cited references 30

Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.

RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation

A novel method for accurate operon predictions in all sequenced prokaryotes

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 179

Cited by 23

Most referenced authors 1,123