KAAS: an automatic genome annotation and pathway reconstruction server

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The number of complete and draft genomes is rapidly growing in recent years, and it has become increasingly important to automate the identification of functional properties and biological roles of genes in these genomes. In the KEGG database, genes in complete genomes are annotated with the KEGG orthology (KO) identifiers, or the K numbers, based on the best hit information using Smith–Waterman scores as well as by the manual curation. Each K number represents an ortholog group of genes, and it is directly linked to an object in the KEGG pathway map or the BRITE functional hierarchy. Here, we have developed a web-based server called KAAS (KEGG Automatic Annotation Server: http://www.genome.jp/kegg/kaas/) i.e. an implementation of a rapid method to automatically assign K numbers to genes in the genome, enabling reconstruction of KEGG pathways and BRITE hierarchies. The method is based on sequence similarities, bi-directional best hit information and some heuristics, and has achieved a high degree of accuracy when compared with the manually curated KEGG GENES database.

Related collections

Most cited references 5

Record: found
Abstract: found
Article: not found

Gene Ontology: tool for the unification of biology

Michael Ashburner, Catherine A. Ball, Judith Blake … (2002)

Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

0 comments Cited 15210 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Identification of common molecular subsequences.

T.F. Smith, M.S. Waterman (1981)

0 comments Cited 1696 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

How well is enzyme function conserved as a function of pairwise sequence identity?

Jeffrey Skolnick, WEIDONG TIAN (2003)

Enzyme function conservation has been used to derive the threshold of sequence identity necessary to transfer function from a protein of known function to an unknown protein. Using pairwise sequence comparison, several studies suggested that when the sequence identity is above 40%, enzyme function is well conserved. In contrast, Rost argued that because of database bias, the results from such simple pairwise comparisons might be misleading. Thus, by grouping enzyme sequences into families based on sequence similarity and selecting representative sequences for comparison, he showed that enzyme function starts to diverge quickly when the sequence identity is below 70%. Here, we employ a strategy similar to Rost's to reduce the database bias; however, we classify enzyme families based not only on sequence similarity, but also on functional similarity, i.e. sequences in each family must have the same four digits or the same first three digits of the enzyme commission (EC) number. Furthermore, instead of selecting representative sequences for comparison, we calculate the function conservation of each enzyme family and then average the degree of enzyme function conservation across all enzyme families. Our analysis suggests that for functional transferability, 40% sequence identity can still be used as a confident threshold to transfer the first three digits of an EC number; however, to transfer all four digits of an EC number, above 60% sequence identity is needed to have at least 90% accuracy. Moreover, when PSI-BLAST is used, the magnitude of the E-value is found to be weakly correlated with the extent of enzyme function conservation in the third iteration of PSI-BLAST. As a result, functional annotation based on the E-values from PSI-BLAST should be used with caution. We also show that by employing an enzyme family-specific sequence identity threshold above which 100% functional conservation is required, functional inference of unknown sequences can be accurately accomplished. However, this comes at a cost: those true positive sequences below this threshold cannot be uniquely identified.

0 comments Cited 126 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Nucleic Acids Res

Journal ID (iso-abbrev): Nucleic Acids Res

Journal ID (publisher-id): nar

Journal ID (hwp): nar

Title: Nucleic Acids Research

Publisher: Oxford University Press

ISSN (Print): 0305-1048

ISSN (Electronic): 1362-4962

Publication date (Print): July 2007

Publication date (Electronic): 25 May 2007

Publication date PMC-release: 25 May 2007

Volume: 35

Issue: Web Server issue

Pages: W182-W185

Affiliations

Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan

Author notes

*To whom Correspondence should be addressed. +81 774 38 3270+81 774 38 3269 kanehisa@ 123456kuicr.kyoto-u.ac.jp

Article

DOI: 10.1093/nar/gkm321

PMC ID: 1933193

PubMed ID: 17526522

SO-VID: bce13370-943e-4e40-984c-449818cd522c

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 30 January 2007

Date revision received : 31 March 2007

Date accepted : 17 April 2007

Comments

Comment on this article

scite_

Cited by 1,657

See all cited by

Most referenced authors 1,647

See all reference authors

KAAS: an automatic genome annotation and pathway reconstruction server

Read this article at

Abstract

Related collections

Genome Integrity

Most cited references 5

Gene Ontology: tool for the unification of biology

Identification of common molecular subsequences.

How well is enzyme function conserved as a function of pairwise sequence identity?

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 248

Cited by 1,657

Most referenced authors 1,647