There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

We have studied the use of nearest-neighbor classifiers to predict the secondary structure of proteins. The nearest-neighbor rule states that a test instance is classified according to the classifications of "nearby" training examples from a database of known structures. In the context of secondary structure prediction, the test instances are windows of n consecutive residues, and the label is the secondary structure type (alpha-helix, beta-strand, or coil) of the center position of the window. To define the neighborhood of a test instance, we employed a novel similarity metric based on the local structural environment scoring scheme of Bowie et al. In this manner, we have attempted to exploit the underlying structural similarity between segments of different proteins to aid in the prediction of secondary structure. Furthermore, in addition to using neighborhoods of fixed radius, we explored a modification of the standard nearest-neighbor algorithm that involved defining an "effective radius" for each exemplar by measuring its performance on a training set. Using these ideas, we achieved a peak prediction accuracy of 68%. Finally, we sought to improve the biological utility of secondary structure prediction by identifying the subset of the predictions that are most likely to be correct. Toward this end, we developed a nearest-neighbor estimator that produced not the traditional "one-state" prediction (alpha-helix, beta-strand, or coil) but rather a probability distribution over the three states. It should be emphasized that this scheme estimates true probability values and that the resulting numbers are not pseudo-probability scores generated by simple normalization of the raw output of the predictor. Applying the mutual information statistic, we found that these probability triplets possess 58% more information than the one-state predictions. Furthermore, the probability estimates allow one to assign an a priori confidence level to the prediction at each residue. Using this approach, we found that the top 28% of the predictions were 86% accurate and the top 43% of the predictions were 81% accurate. These results indicate that, notwithstanding the limitations on overall accuracy of secondary structure prediction, a substantial proportion of a protein can be predicted with considerable accuracy.

Related collections

Author and article information

Journal

PubMed ID:: 8371270

DOI:: 10.1006/jmbi.1993.1464

ScienceOpen disciplines: Chemistry

Keywords: Algorithms,Artificial Intelligence,Databases, Factual,Models, Chemical,Neural Networks (Computer),Probability,Protein Structure, Secondary,Reproducibility of Results

Data availability:

ScienceOpen disciplines: Chemistry

Keywords: Algorithms, Artificial Intelligence, Databases, Factual, Models, Chemical, Neural Networks (Computer), Probability, Protein Structure, Secondary, Reproducibility of Results

Protein secondary structure prediction using nearest-neighbor methods.

Read this article at

Abstract

Related collections

Artificial Intelligence in Medicine

Author and article information

Journal

Comments

Comment on this article

Similar content 109

Cited by 9