Improved contact prediction in proteins: Using pseudolikelihoods to

 infer Potts models

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Spatially proximate amino acids in a protein tend to coevolve. A protein's three-dimensional (3D) structure hence leaves an echo of correlations in the evolutionary record. Reverse engineering 3D structures from such correlations is an open problem in structural biology, pursued with increasing vigor as more and more protein sequences continue to fill the data banks. Within this task lies a statistical inference problem, rooted in the following: correlation between two sites in a protein sequence can arise from firsthand interaction but can also be network-propagated via intermediate sites; observed correlation is not enough to guarantee proximity. To separate direct from indirect interactions is an instance of the general problem of inverse statistical mechanics, where the task is to learn model parameters (fields, couplings) from observables (magnetizations, correlations, samples) in large systems. In the context of protein sequences, the approach has been referred to as direct-coupling analysis. Here we show that the pseudolikelihood method, applied to 21-state Potts models describing the statistical properties of families of evolutionarily related proteins, significantly outperforms existing approaches to the direct-coupling analysis, the latter being based on standard mean-field techniques. This improved performance also relies on a modified score for the coupling strength. The results are verified using known crystal structures of specific sequence instances of various protein families. Code implementing the new method can be found at http://plmdca.csc.kth.se/.

Related collections

Most cited references 7

Record: found
Abstract: found
Article: found

Is Open Access

Protein 3D Structure Computed from Evolutionary Sequence Variation

Debora S. Marks, Lucy Colwell, Robert Sheridan … (2011)

The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes.

0 comments Cited 443 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Weak pairwise correlations imply strongly correlated network states in a neural population

Gasper Tkacik (2006)

Biological networks have so many possible states that exhaustive sampling is impossible. Successful analysis thus depends on simplifying hypotheses, but experiments on many systems hint that complicated, higher order interactions among large groups of elements play an important role. In the vertebrate retina, we show that weak correlations between pairs of neurons coexist with strongly collective behavior in the responses of ten or more neurons. Surprisingly, we find that this collective behavior is described quantitatively by models that capture the observed pairwise correlations but assume no higher order interactions. These maximum entropy models are equivalent to Ising models, and predict that larger networks are completely dominated by correlation effects. This suggests that the neural code has associative or error-correcting properties, and we provide preliminary evidence for such behavior. As a first test for the generality of these ideas, we show that similar results are obtained from networks of cultured cortical neurons.

0 comments Cited 353 times – based on 0 reviews

Preprint

     Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Statistical Analysis of Non-Lattice Data

Julian Besag (1975)

0 comments Cited 241 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Publication date Created: 2012-11-06

Publication date Updated: 2013-01-12

Article

DOI: 10.1103/PhysRevE.87.012707

ArXiV ID: 1211.1281

SO-VID: bbf8c3e0-9644-450d-a865-424ff25fa6a2

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Journal reference M. Ekeberg, C. L\"ovkvist, Y. Lan, M. Weigt, E. Aurell, Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models, Phys. Rev. E 87, 012707 (2013)

Comments 19 pages, 16 figures, published version

Categories q-bio.QM cond-mat.dis-nn cond-mat.stat-mech physics.data-an

ScienceOpen disciplines: Condensed matter,Quantitative & Systems biology,Mathematical & Computational physics,Theoretical physics

Data availability:

ScienceOpen disciplines: Condensed matter, Quantitative & Systems biology, Mathematical & Computational physics, Theoretical physics

Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models

Read this article at

Abstract

Related collections

RWTH Aachen Physics Collection

Most cited references 7

Protein 3D Structure Computed from Evolutionary Sequence Variation

Weak pairwise correlations imply strongly correlated network states in a neural population

Statistical Analysis of Non-Lattice Data

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 87

Cited by 67

Most referenced authors 229