255
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Protein 3D Structure Computed from Evolutionary Sequence Variation

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing.

          In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy.

          We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å C α-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes.

          Related collections

          Most cited references77

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Weak pairwise correlations imply strongly correlated network states in a neural population

          Biological networks have so many possible states that exhaustive sampling is impossible. Successful analysis thus depends on simplifying hypotheses, but experiments on many systems hint that complicated, higher order interactions among large groups of elements play an important role. In the vertebrate retina, we show that weak correlations between pairs of neurons coexist with strongly collective behavior in the responses of ten or more neurons. Surprisingly, we find that this collective behavior is described quantitatively by models that capture the observed pairwise correlations but assume no higher order interactions. These maximum entropy models are equivalent to Ising models, and predict that larger networks are completely dominated by correlation effects. This suggests that the neural code has associative or error-correcting properties, and we provide preliminary evidence for such behavior. As a first test for the generality of these ideas, we show that similar results are obtained from networks of cultured cortical neurons.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Protein sectors: evolutionary units of three-dimensional structure.

            Proteins display a hierarchy of structural features at primary, secondary, tertiary, and higher-order levels, an organization that guides our current understanding of their biological properties and evolutionary origins. Here, we reveal a structural organization distinct from this traditional hierarchy by statistical analysis of correlated evolution between amino acids. Applied to the S1A serine proteases, the analysis indicates a decomposition of the protein into three quasi-independent groups of correlated amino acids that we term "protein sectors." Each sector is physically connected in the tertiary structure, has a distinct functional role, and constitutes an independent mode of sequence divergence in the protein family. Functionally relevant sectors are evident in other protein families as well, suggesting that they may be general features of proteins. We propose that sectors represent a structural organization of proteins that reflects their evolutionary histories.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Identification of direct residue contacts in protein-protein interaction by message passing.

              Understanding the molecular determinants of specificity in protein-protein interaction is an outstanding challenge of postgenome biology. The availability of large protein databases generated from sequences of hundreds of bacterial genomes enables various statistical approaches to this problem. In this context covariance-based methods have been used to identify correlation between amino acid positions in interacting proteins. However, these methods have an important shortcoming, in that they cannot distinguish between directly and indirectly correlated residues. We developed a method that combines covariance analysis with global inference analysis, adopted from use in statistical physics. Applied to a set of >2,500 representatives of the bacterial two-component signal transduction system, the combination of covariance with global inference successfully and robustly identified residue pairs that are proximal in space without resorting to ad hoc tuning parameters, both for heterointeractions between sensor kinase (SK) and response regulator (RR) proteins and for homointeractions between RR proteins. The spectacular success of this approach illustrates the effectiveness of the global inference approach in identifying direct interaction based on sequence information alone. We expect this method to be applicable soon to interaction surfaces between proteins present in only 1 copy per genome as the number of sequenced genomes continues to expand. Use of this method could significantly increase the potential targets for therapeutic intervention, shed light on the mechanism of protein-protein interaction, and establish the foundation for the accurate prediction of interacting protein partners.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, USA )
                1932-6203
                2011
                7 December 2011
                : 6
                : 12
                : e28766
                Affiliations
                [1 ]Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, United States of America
                [2 ]MRC Laboratory of Molecular Biology, Hills Road, Cambridge, United Kingdom
                [3 ]Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
                [4 ]Human Genetics Foundation, Torino, Italy
                [5 ]Politecnico di Torino, Torino, Italy
                University of California San Francisco, United States of America
                Author notes

                Conceived and designed the experiments: DM CS. Performed the experiments: DM LC RS TH AP. Analyzed the data: DM LC RS TH AP RZ CS. Wrote the paper: DM LC CS. Development of statistical method: AP RZ.

                Article
                PONE-D-11-22341
                10.1371/journal.pone.0028766
                3233603
                22163331
                c2d36742-2088-432c-ac19-5612027cd5f7
                Marks et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                History
                : 10 November 2011
                : 14 November 2011
                Page count
                Pages: 20
                Categories
                Research Article
                Biology
                Biochemistry
                Proteins
                Protein Structure
                Biophysics
                Protein Folding
                Computational Biology
                Macromolecular Structure Analysis
                Protein Structure
                Evolutionary Modeling
                Genomics
                Systems Biology
                Physics
                Biophysics
                Protein Folding
                Interdisciplinary Physics
                Statistical Mechanics

                Uncategorized
                Uncategorized

                Comments

                Comment on this article