Effective inter-residue contact definitions for accurate protein fold recognition

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Effective encoding of residue contact information is crucial for protein structure prediction since it has a unique role to capture long-range residue interactions compared to other commonly used scoring terms. The residue contact information can be incorporated in structure prediction in several different ways: It can be incorporated as statistical potentials or it can be also used as constraints in ab initio structure prediction. To seek the most effective definition of residue contacts for template-based protein structure prediction, we evaluated 45 different contact definitions, varying bases of contacts and distance cutoffs, in terms of their ability to identify proteins of the same fold.

Results

We found that overall the residue contact pattern can distinguish protein folds best when contacts are defined for residue pairs whose Cβ atoms are at 7.0 Å or closer to each other. Lower fold recognition accuracy was observed when inaccurate threading alignments were used to identify common residue contacts between protein pairs. In the case of threading, alignment accuracy strongly influences the fraction of common contacts identified among proteins of the same fold, which eventually affects the fold recognition accuracy. The largest deterioration of the fold recognition was observed for β-class proteins when the threading methods were used because the average alignment accuracy was worst for this fold class. When results of fold recognition were examined for individual proteins, we found that the effective contact definition depends on the fold of the proteins. A larger distance cutoff is often advantageous for capturing spatial arrangement of the secondary structures which are not physically in contact. For capturing contacts between neighboring β strands, considering the distance between Cα atoms is better than the Cβ−based distance because the side-chain of interacting residues on β strands sometimes point to opposite directions.

Conclusion

Residue contacts defined by Cβ−Cβ distance of 7.0 Å work best overall among tested to identify proteins of the same fold. We also found that effective contact definitions differ from fold to fold, suggesting that using different residue contact definition specific for each template will lead to improvement of the performance of threading.

Related collections

Most cited references 42

Record: found
Abstract: found
Article: found

Is Open Access

Data growth and its impact on the SCOP database: new developments

Antonina Andreeva, Dave Howorth, John-Marc Chandonia … (2008)

The Structural Classification of Proteins (SCOP) database is a comprehensive ordering of all proteins of known structure, according to their evolutionary and structural relationships. The SCOP hierarchy comprises the following levels: Species, Protein, Family, Superfamily, Fold and Class. While keeping the original classification scheme intact, we have changed the production of SCOP in order to cope with a rapid growth of new structural data and to facilitate the discovery of new protein relationships. We describe ongoing developments and new features implemented in SCOP. A new update protocol supports batch classification of new protein structures by their detected relationships at Family and Superfamily levels in contrast to our previous sequential handling of new structural data by release date. We introduce pre-SCOP, a preview of the SCOP developmental version that enables earlier access to the information on new relationships. We also discuss the impact of worldwide Structural Genomics initiatives, which are producing new protein structures at an increasing rate, on the rates of discovery and growth of protein families and superfamilies. SCOP can be accessed at http://scop.mrc-lmb.cam.ac.uk/scop.

0 comments Cited 347 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

How significant is a protein structure similarity with TM-score = 0.5?

Jinrui Xu, Yang Zhang (2010)

Protein structure similarity is often measured by root mean squared deviation, global distance test score and template modeling score (TM-score). However, the scores themselves cannot provide information on how significant the structural similarity is. Also, it lacks a quantitative relation between the scores and conventional fold classifications. This article aims to answer two questions: (i) what is the statistical significance of TM-score? (ii) What is the probability of two proteins having the same fold given a specific TM-score? We first made an all-to-all gapless structural match on 6684 non-homologous single-domain proteins in the PDB and found that the TM-scores follow an extreme value distribution. The data allow us to assign each TM-score a P-value that measures the chance of two randomly selected proteins obtaining an equal or higher TM-score. With a TM-score at 0.5, for instance, its P-value is 5.5 x 10(-7), which means we need to consider at least 1.8 million random protein pairs to acquire a TM-score of no less than 0.5. Second, we examine the posterior probability of the same fold proteins from three datasets SCOP, CATH and the consensus of SCOP and CATH. It is found that the posterior probability from different datasets has a similar rapid phase transition around TM-score=0.5. This finding indicates that TM-score can be used as an approximate but quantitative criterion for protein topology classification, i.e. protein pairs with a TM-score >0.5 are mostly in the same fold while those with a TM-score <0.5 are mainly not in the same fold.

0 comments Cited 280 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation

Sanzo Miyazawa, Robert Jernigan (1985)

0 comments Cited 269 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Journal ID (iso-abbrev): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central

ISSN (Electronic): 1471-2105

Publication date Collection: 2012

Publication date (Electronic): 9 November 2012

Volume: 13

Page: 292

Affiliations

[1 ]Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA

[2 ]Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA

Article

Publisher ID: 1471-2105-13-292

DOI: 10.1186/1471-2105-13-292

PMC ID: 3534397

PubMed ID: 23140471

SO-VID: 27ac1ee1-3b14-405f-8474-4a52e43a5dc6

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Effective inter-residue contact definitions for accurate protein fold recognition

Read this article at

Abstract

Background

Results

Conclusion

Related collections

Genetoberfest

Most cited references 42

Data growth and its impact on the SCOP database: new developments

How significant is a protein structure similarity with TM-score = 0.5?

Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 61

Cited by 18

Most referenced authors 811