There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.
Abstract
Rational classification of proteins encoded in sequenced genomes is critical for making
the genome sequences maximally useful for functional and evolutionary studies. The
database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic
classification of the proteins encoded in 21 complete genomes of bacteria, archaea
and eukaryotes (http://www. ncbi.nlm. nih.gov/COG). The COGs were constructed by applying
the criterion of consistency of genome-specific best hits to the results of an exhaustive
comparison of all protein sequences from these genomes. The database comprises 2091
COGs that include 56-83% of the gene products from each of the complete bacterial
and archaeal genomes and approximately 35% of those from the yeast Saccharomyces cerevisiae
genome. The COG database is accompanied by the COGNITOR program that is used to fit
new proteins into the COGs and can be applied to functional and phylogenetic annotation
of newly sequenced genomes.