A structural study for the optimisation of functional motifs encoded in protein sequences

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

A large number of PROSITE patterns select false positives and/or miss known true positives. It is possible that – at least in some cases – the weak specificity and/or sensitivity of a pattern is due to the fact that one, or maybe more, functional and/or structural key residues are not represented in the pattern. Multiple sequence alignments are commonly used to build functional sequence patterns. If residues structurally conserved in proteins sharing a function cannot be aligned in a multiple sequence alignment, they are likely to be missed in a standard pattern construction procedure.

Results

Here we present a new procedure aimed at improving the sensitivity and/ or specificity of poorly-performing patterns. The procedure can be summarised as follows: 1. residues structurally conserved in different proteins, that are true positives for a pattern, are identified by means of a computational technique and by visual inspection. 2. the sequence positions of the structurally conserved residues falling outside the pattern are used to build extended sequence patterns. 3. the extended patterns are optimised on the SWISS-PROT database for their sensitivity and specificity. The method was applied to eight PROSITE patterns. Whenever structurally conserved residues are found in the surface region close to the pattern (seven out of eight cases), the addition of information inferred from structural analysis is shown to improve pattern selectivity and in some cases selectivity and sensitivity as well. In some of the cases considered the procedure allowed the identification of functionally interesting residues, whose biological role is also discussed.

Conclusion

Our method can be applied to any type of functional motif or pattern (not only PROSITE ones) which is not able to select all and only the true positive hits and for which at least two true positive structures are available. The computational technique for the identification of structurally conserved residues is already available on request and will be soon accessible on our web server. The procedure is intended for the use of pattern database curators and of scientists interested in a specific protein family for which no specific or selective patterns are yet available.

Related collections

Most cited references 29

Record: found
Abstract: found
Article: not found

Mapping the protein universe.

Thomas Holm, Chris Sander (1996)

The comparison of the three-dimensional shapes of protein molecules poses a complex algorithmic problem. Its solution provides biologists with computational tools to organize the rapidly growing set of thousands of known protein shapes, to identify new types of protein architecture, and to discover unexpected evolutionary relations, reaching back billions of years, between protein molecules. Protein shape comparison also improves tools for identifying gene functions in genome databases by defining the essential sequence-structure features of a protein family. Finally, an exhaustive all-on-all shape comparison provides a map of physical attractor regions in the abstract shape space of proteins, with implications for the processes of protein folding and evolution.

0 comments Cited 145 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The PROSITE database, its status in 2002.

Philipp Bucher, Marco Pagni, Nicolas Hulo … (2002)

PROSITE [Bairoch and Bucher (1994) Nucleic Acids Res., 22, 3583-3589; Hofmann et al. (1999) Nucleic Acids Res., 27, 215-219] is a method of identifying the functions of uncharacterized proteins translated from genomic or cDNA sequences. The PROSITE database (http://www.expasy.org/prosite/) consists of biologically significant patterns and profiles designed in such a way that with appropriate computational tools it can rapidly and reliably help to determine to which known family of proteins (if any) a new sequence belongs, or which known domain(s) it contains.

0 comments Cited 117 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Recognition of spatial motifs in protein structures.

G Kleywegt (1999)

As the structural database continues to expand, new methods are required to analyse and compare protein structures. Whereas the recognition, comparison, and classification of folds is now more or less a solved problem, tools for the study of constellations of small numbers of residues are few and far between. In this paper, two programs are described for the analysis of spatial motifs in protein structures. The first, SPASM, can be used to find the occurrence of a motif consisting of arbitrary main-chain and/or side-chains in a database of protein structures. The program also has a unique capability to carry out "fuzzy pattern matching" with relaxed requirements on the types of some or all of the matching residues. The second program, RIGOR, scans a single protein structure for the occurrence of any of a set of pre-defined motifs from a database. In one application, spatial motif recognition combined with profile analysis enabled the assignment of the structural and functional class of an uncharacterised hypothetical protein in the sequence database. In another application, the occurrence of short left-handed helical segments in protein structures was investigated, and such segments were found to be fairly common. Potential applications of the techniques presented here lie in the analysis of (newly determined) structures, in comparative structural analysis, in the design and engineering of novel functional sites, and in the prediction of structure and function of uncharacterised proteins. Copyright 1999 Academic Press.

0 comments Cited 65 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central (London )

ISSN (Electronic): 1471-2105

Publication date Collection: 2004

Publication date (Electronic): 30 April 2004

Volume: 5

Page: 50

Affiliations

[1 ]Centre for Molecular Bioinformatics, Dept. of Biology, University of Rome Tor Vergata, Rome (Italy)

Article

Publisher ID: 1471-2105-5-50

DOI: 10.1186/1471-2105-5-50

PMC ID: 420233

PubMed ID: 15119965

SO-VID: 8a8e002f-7a2c-4195-bf34-2ca851dba415

Copyright © Copyright © 2004 Via and Helmer-Citterich; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.

A structural study for the optimisation of functional motifs encoded in protein sequences

Read this article at

Abstract

Background

Results

Conclusion

Related collections

Genetoberfest

Most cited references 29

Mapping the protein universe.

The PROSITE database, its status in 2002.

Recognition of spatial motifs in protein structures.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 96

Cited by 2

Most referenced authors 668