39
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Prediction of protein domain boundaries from inverse covariances

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          It has been known even since relatively few structures had been solved that longer protein chains often contain multiple domains, which may fold separately and play the role of reusable functional modules found in many contexts. In many structural biology tasks, in particular structure prediction, it is of great use to be able to identify domains within the structure and analyze these regions separately. However, when using sequence data alone this task has proven exceptionally difficult, with relatively little improvement over the naive method of choosing boundaries based on size distributions of observed domains. The recent significant improvement in contact prediction provides a new source of information for domain prediction. We test several methods for using this information including a kernel smoothing-based approach and methods based on building alpha-carbon models and compare performance with a length-based predictor, a homology search method and four published sequence-based predictors: DOMCUT, DomPRO, DLP-SVM, and SCOOBY-DOmain. We show that the kernel-smoothing method is significantly better than the other ab initio predictors when both single-domain and multidomain targets are considered and is not significantly different to the homology-based method. Considering only multidomain targets the kernel-smoothing method outperforms all of the published methods except DLP-SVM. The kernel smoothing method therefore represents a potentially useful improvement to ab initio domain prediction. Proteins 2013. © 2012 Wiley Periodicals, Inc.

          Related collections

          Most cited references52

          • Record: found
          • Abstract: found
          • Article: not found

          Sparse inverse covariance estimation with the graphical lasso.

          We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple algorithm--the graphical lasso--that is remarkably fast: It solves a 1000-node problem ( approximately 500,000 parameters) in at most a minute and is 30-4000 times faster than competing methods. It also provides a conceptual link between the exact problem and the approximation suggested by Meinshausen and Bühlmann (2006). We illustrate the method on some cell-signaling data from proteomics.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A new generation of homology search tools based on probabilistic inference.

            Many theoretical advances have been made in applying probabilistic inference methods to improve the power of sequence homology searches, yet the BLAST suite of programs is still the workhorse for most of the field. The main reason for this is practical: BLAST's programs are about 100-fold faster than the fastest competing implementations of probabilistic inference methods. I describe recent work on the HMMER software suite for protein sequence analysis, which implements probabilistic inference using profile hidden Markov models. Our aim in HMMER3 is to achieve BLAST's speed while further improving the power of probabilistic inference based methods. HMMER3 implements a new probabilistic model of local sequence alignment and a new heuristic acceleration algorithm. Combined with efficient vector-parallel implementations on modern processors, these improvements synergize. HMMER3 uses more powerful log-odds likelihood scores (scores summed over alignment uncertainty, rather than scoring a single optimal alignment); it calculates accurate expectation values (E-values) for those scores without simulation using a generalization of Karlin/Altschul theory; it computes posterior distributions over the ensemble of possible alignments and returns posterior probabilities (confidences) in each aligned residue; and it does all this at an overall speed comparable to BLAST. The HMMER project aims to usher in a new generation of more powerful homology search tools based on probabilistic inference methods.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Protein 3D Structure Computed from Evolutionary Sequence Variation

              The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes.
                Bookmark

                Author and article information

                Journal
                Proteins
                Proteins
                prot
                Proteins
                Wiley Subscription Services, Inc., A Wiley Company (Hoboken )
                0887-3585
                1097-0134
                February 2013
                : 81
                : 2
                : 253-260
                Affiliations
                MRC National Institute for Medical Research, The Ridgeway Mill Hill, London NW71AA, United Kingdom
                Author notes
                Correspondence to: Michael I. Sadowski; MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London NW71AA, United Kingdom. E-mail: msadows@ 123456nimr.mrc.ac.uk

                Grant sponsor: MRC; Grant number: U117581331

                Article
                10.1002/prot.24181
                3563215
                22987736
                3af47256-50fb-4367-b2ff-17a49b9d8dbb
                Copyright © 2012 Wiley Periodicals, Inc.

                Re-use of this article is permitted in accordance with the Creative Commons Deed, Attribution 2.5, which does not permit commercial exploitation.

                History
                : 23 May 2012
                : 10 August 2012
                : 04 September 2012
                Categories
                Articles

                Biochemistry
                bioinformatics methods,dlp-svm,domain parsing,domcut,dompro,ft-comar,kernel density estimation,protein contact prediction,protein structure prediction,scooby-do

                Comments

                Comment on this article