Prediction of protein domain boundaries from inverse covariances

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

It has been known even since relatively few structures had been solved that longer protein chains often contain multiple domains, which may fold separately and play the role of reusable functional modules found in many contexts. In many structural biology tasks, in particular structure prediction, it is of great use to be able to identify domains within the structure and analyze these regions separately. However, when using sequence data alone this task has proven exceptionally difficult, with relatively little improvement over the naive method of choosing boundaries based on size distributions of observed domains. The recent significant improvement in contact prediction provides a new source of information for domain prediction. We test several methods for using this information including a kernel smoothing-based approach and methods based on building alpha-carbon models and compare performance with a length-based predictor, a homology search method and four published sequence-based predictors: DOMCUT, DomPRO, DLP-SVM, and SCOOBY-DOmain. We show that the kernel-smoothing method is significantly better than the other ab initio predictors when both single-domain and multidomain targets are considered and is not significantly different to the homology-based method. Considering only multidomain targets the kernel-smoothing method outperforms all of the published methods except DLP-SVM. The kernel smoothing method therefore represents a potentially useful improvement to ab initio domain prediction. Proteins 2013. © 2012 Wiley Periodicals, Inc.

Related collections

Most cited references 52

Record: found
Abstract: found
Article: not found

Sparse inverse covariance estimation with the graphical lasso.

J. Friedman, T. Hastie, R. Tibshirani (2008)

We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple algorithm--the graphical lasso--that is remarkably fast: It solves a 1000-node problem ( approximately 500,000 parameters) in at most a minute and is 30-4000 times faster than competing methods. It also provides a conceptual link between the exact problem and the approximation suggested by Meinshausen and Bühlmann (2006). We illustrate the method on some cell-signaling data from proteomics.

0 comments Cited 1447 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A new generation of homology search tools based on probabilistic inference.

Sean R. Eddy (2009)

Many theoretical advances have been made in applying probabilistic inference methods to improve the power of sequence homology searches, yet the BLAST suite of programs is still the workhorse for most of the field. The main reason for this is practical: BLAST's programs are about 100-fold faster than the fastest competing implementations of probabilistic inference methods. I describe recent work on the HMMER software suite for protein sequence analysis, which implements probabilistic inference using profile hidden Markov models. Our aim in HMMER3 is to achieve BLAST's speed while further improving the power of probabilistic inference based methods. HMMER3 implements a new probabilistic model of local sequence alignment and a new heuristic acceleration algorithm. Combined with efficient vector-parallel implementations on modern processors, these improvements synergize. HMMER3 uses more powerful log-odds likelihood scores (scores summed over alignment uncertainty, rather than scoring a single optimal alignment); it calculates accurate expectation values (E-values) for those scores without simulation using a generalization of Karlin/Altschul theory; it computes posterior distributions over the ensemble of possible alignments and returns posterior probabilities (confidences) in each aligned residue; and it does all this at an overall speed comparable to BLAST. The HMMER project aims to usher in a new generation of more powerful homology search tools based on probabilistic inference methods.

0 comments Cited 465 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Protein 3D Structure Computed from Evolutionary Sequence Variation

Debora S. Marks, Lucy Colwell, Robert Sheridan … (2011)

The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes.

0 comments Cited 443 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Proteins

Journal ID (iso-abbrev): Proteins

Journal ID (publisher-id): prot

Title: Proteins

Publisher: Wiley Subscription Services, Inc., A Wiley Company (Hoboken )

ISSN (Print): 0887-3585

ISSN (Electronic): 1097-0134

Publication date (Print): February 2013

Volume: 81

Issue: 2

Pages: 253-260

Affiliations

MRC National Institute for Medical Research, The Ridgeway Mill Hill, London NW71AA, United Kingdom

Author notes

Correspondence to: Michael I. Sadowski; MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London NW71AA, United Kingdom. E-mail: msadows@ 123456nimr.mrc.ac.uk

Grant sponsor: MRC; Grant number: U117581331

Article

DOI: 10.1002/prot.24181

PMC ID: 3563215

PubMed ID: 22987736

SO-VID: 3af47256-50fb-4367-b2ff-17a49b9d8dbb

License:

Re-use of this article is permitted in accordance with the Creative Commons Deed, Attribution 2.5, which does not permit commercial exploitation.

History

Date received : 23 May 2012

Date revision received : 10 August 2012

Date accepted : 04 September 2012

Comments

Comment on this article

scite_

Cited by 3

See all cited by

Most referenced authors 354

See all reference authors

Prediction of protein domain boundaries from inverse covariances

Read this article at

Abstract

Related collections

Journal of Circulating Biomarkers

Most cited references 52

Sparse inverse covariance estimation with the graphical lasso.

A new generation of homology search tools based on probabilistic inference.

Protein 3D Structure Computed from Evolutionary Sequence Variation

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 141

Cited by 3

Most referenced authors 354