There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.
Abstract
Unaligned amino acid sequences can be characterized by their composition of amino
acid n-tuples (i.e. doublets, triplets, quadruplets, etc.). In this study we investigated
the performance of two statistics, termed commonality and specificity, that are derived
from n-tuple counts using a set of G-protein coupled receptor (GPCR) sequences. The
commonality of a tuple is defined as its relative occurrence in the sequences that
belong to a given GPCR subtype. The specificity of a tuple is derived from its relative
occurrence in the sequences of a given GPCR subtype and from its relative non-occurrence
in the sequences that do not belong to this subtype. A graphical presentation, termed
'polygram', is described for the visualization of common and specific tuples. The
method can be applied to the classification of unknown GPCR sequences. It can also
be applied to the identification of fragments of GPCRs, such as may occur in chimeric
receptors. The method is generally applicable to other protein families and other
types of coding.