Improving the accuracy of protein secondary structure prediction using structural alignment

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

The accuracy of protein secondary structure prediction has steadily improved over the past 30 years. Now many secondary structure prediction methods routinely achieve an accuracy (Q3) of about 75%. We believe this accuracy could be further improved by including structure (as opposed to sequence) database comparisons as part of the prediction process. Indeed, given the large size of the Protein Data Bank (>35,000 sequences), the probability of a newly identified sequence having a structural homologue is actually quite high.

Results

We have developed a method that performs structure-based sequence alignments as part of the secondary structure prediction process. By mapping the structure of a known homologue (sequence ID >25%) onto the query protein's sequence, it is possible to predict at least a portion of that query protein's secondary structure. By integrating this structural alignment approach with conventional (sequence-based) secondary structure methods and then combining it with a "jury-of-experts" system to generate a consensus result, it is possible to attain very high prediction accuracy. Using a sequence-unique test set of 1644 proteins from EVA, this new method achieves an average Q3 score of 81.3%. Extensive testing indicates this is approximately 4–5% better than any other method currently available. Assessments using non sequence-unique test sets (typical of those used in proteome annotation or structural genomics) indicate that this new method can achieve a Q3 score approaching 88%.

Conclusion

By using both sequence and structure databases and by exploiting the latest techniques in machine learning it is possible to routinely predict protein secondary structure with an accuracy well above 80%. A program and web server, called PROTEUS, that performs these secondary structure predictions is accessible at http://wishart.biology.ualberta.ca/proteus. For high throughput or batch sequence analyses, the PROTEUS programs, databases (and server) can be downloaded and run locally.

Related collections

Most cited references 46

Record: found
Abstract: found
Article: not found

The PredictProtein server.

Burkhard Rost, Guy Yachdav, Jinfeng Liu (2004)

PredictProtein (http://www.predictprotein.org) is an Internet service for sequence analysis and the prediction of protein structure and function. Users submit protein sequences or alignments; PredictProtein returns multiple sequence alignments, PROSITE sequence motifs, low-complexity regions (SEG), nuclear localization signals, regions lacking regular structure (NORS) and predictions of secondary structure, solvent accessibility, globular regions, transmembrane helices, coiled-coil regions, structural switch regions, disulfide-bonds, sub-cellular localization and functional annotations. Upon request fold recognition by prediction-based threading, CHOP domain assignments, predictions of transmembrane strands and inter-residue contacts are also available. For all services, users can submit their query either by electronic mail or interactively via the World Wide Web.

0 comments Cited 308 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins.

J Garnier, D.J. Osguthorpe, B Robson (1978)

0 comments Cited 259 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

VADAR: a web server for quantitative evaluation of protein structure quality.

Leigh Willard, Anuj Ranjan, Haiyan Zhang … (2003)

VADAR (Volume Area Dihedral Angle Reporter) is a comprehensive web server for quantitative protein structure evaluation. It accepts Protein Data Bank (PDB) formatted files or PDB accession numbers as input and calculates, identifies, graphs, reports and/or evaluates a large number (>30) of key structural parameters both for individual residues and for the entire protein. These include excluded volume, accessible surface area, backbone and side chain dihedral angles, secondary structure, hydrogen bonding partners, hydrogen bond energies, steric quality, solvation free energy as well as local and overall fold quality. These derived parameters can be used to rapidly identify both general and residue-specific problems within newly determined protein structures. The VADAR web server is freely accessible at http://redpoll.pharmacy.ualberta.ca/vadar.

0 comments Cited 206 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central (London )

ISSN (Electronic): 1471-2105

Publication date Collection: 2006

Publication date (Electronic): 14 June 2006

Volume: 7

Page: 301

Affiliations

[1 ]Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2E8, Canada

[2 ]Department of Biological Sciences, University of Alberta, Edmonton, AB, T6G 2E9, Canada

Article

Publisher ID: 1471-2105-7-301

DOI: 10.1186/1471-2105-7-301

PMC ID: 1550433

PubMed ID: 16774686

SO-VID: 0ad57439-11cd-479b-b004-72e1bae4aabc

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Improving the accuracy of protein secondary structure prediction using structural alignment

Read this article at

Abstract

Background

Results

Conclusion

Related collections

Genetoberfest

Most cited references 46

The PredictProtein server.

Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins.

VADAR: a web server for quantitative evaluation of protein structure quality.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 156

Cited by 33

Most referenced authors 1,881