Network deconvolution as a general method to distinguish direct dependencies in networks

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Recognizing direct relationships between variables connected in a network is a pervasive problem in biological, social and information sciences as correlation-based networks contain numerous indirect relationships. Here we present a general method for inferring direct effects from an observed correlation matrix containing both direct and indirect effects. We formulate the problem as the inverse of network convolution, and introduce an algorithm that removes the combined effect of all indirect paths of arbitrary length in a closed-form solution by exploiting eigen-decomposition and infinite-series sums. We demonstrate the effectiveness of our approach in several network applications: distinguishing direct targets in gene expression regulatory networks; recognizing directly-interacting amino-acid residues for protein structure prediction from sequence alignments; and distinguishing strong collaborations in co-authorship social networks using connectivity information alone.

Related collections

Most cited references 52

Record: found
Abstract: found
Article: found

Is Open Access

Protein 3D Structure Computed from Evolutionary Sequence Variation

Debora S. Marks, Lucy Colwell, Robert Sheridan … (2011)

The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes.

0 comments Cited 443 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Direct-coupling analysis of residue coevolution captures native contacts across many protein families.

Faruck Morcos, Andrea Pagnani, Bryan Lunt … (2011)

The similarity in the three-dimensional structures of homologous proteins imposes strong constraints on their sequence variability. It has long been suggested that the resulting correlations among amino acid compositions at different sequence positions can be exploited to infer spatial contacts within the tertiary protein structure. Crucial to this inference is the ability to disentangle direct and indirect correlations, as accomplished by the recently introduced direct-coupling analysis (DCA). Here we develop a computationally efficient implementation of DCA, which allows us to evaluate the accuracy of contact prediction by DCA for a large number of protein domains, based purely on sequence information. DCA is shown to yield a large number of correctly predicted contacts, recapitulating the global structure of the contact map for the majority of the protein domains examined. Furthermore, our analysis captures clear signals beyond intradomain residue contacts, arising, e.g., from alternative protein conformations, ligand-mediated residue couplings, and interdomain interactions in protein oligomers. Our findings suggest that contacts predicted by DCA can be used as a reliable guide to facilitate computational predictions of alternative protein conformations, protein complex formation, and even the de novo prediction of protein domain structures, contingent on the existence of a large number of homologous sequences which are being rapidly made available due to advances in genome sequencing.

0 comments Cited 395 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments.

David T. W. Jones, Daniel Buchan, Domenico Cozzetto … (2012)

The accurate prediction of residue-residue contacts, critical for maintaining the native fold of a protein, remains an open problem in the field of structural bioinformatics. Interest in this long-standing problem has increased recently with algorithmic improvements and the rapid growth in the sizes of sequence families. Progress could have major impacts in both structure and function prediction to name but two benefits. Sequence-based contact predictions are usually made by identifying correlated mutations within multiple sequence alignments (MSAs), most commonly through the information-theoretic approach of calculating mutual information between pairs of sites in proteins. These predictions are often inaccurate because the true covariation signal in the MSA is often masked by biases from many ancillary indirect-coupling or phylogenetic effects. Here we present a novel method, PSICOV, which introduces the use of sparse inverse covariance estimation to the problem of protein contact prediction. Our method builds on work which had previously demonstrated corrections for phylogenetic and entropic correlation noise and allows accurate discrimination of direct from indirectly coupled mutation correlations in the MSA. PSICOV displays a mean precision substantially better than the best performing normalized mutual information approach and Bayesian networks. For 118 out of 150 targets, the L/5 (i.e. top-L/5 predictions for a protein of length L) precision for long-range contacts (sequence separation >23) was ≥ 0.5, which represents an improvement sufficient to be of significant benefit in protein structure prediction or model quality assessment. The PSICOV source code can be downloaded from http://bioinf.cs.ucl.ac.uk/downloads/PSICOV.

0 comments Cited 347 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-journal-id): 9604648

Journal ID (pubmed-jr-id): 20305

Journal ID (nlm-ta): Nat Biotechnol

Journal ID (iso-abbrev): Nat. Biotechnol.

Title: Nature biotechnology

ISSN (Print): 1087-0156

ISSN (Electronic): 1546-1696

Publication date Nihms-submitted: 6 August 2013

Publication date (Electronic): 14 July 2013

Publication date (Print): August 2013

Publication date PMC-release: 01 February 2014

Volume: 31

Issue: 8

Pages: 726-733

Affiliations

[1 ]Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, USA

[2 ]Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA

[3 ]Research Laboratory of Electronics (RLE) at MIT, Cambridge, Massachusetts, USA

Author notes

Correspondence and requests for materials should be addressed to M.K. ( manoli@ 123456mit.edu ). All code and datasets are available at http://compbio.mit.edu/nd and Supplementary Data File

Article

Manuscript ID: NIHMS492182

DOI: 10.1038/nbt.2635

PMC ID: 3773370

PubMed ID: 23851448

SO-VID: df757ed6-d00b-446d-beb3-19d86d25cafe

History

Funding

Funded by: National Human Genome Research Institute : NHGRI

Award ID: RC2 HG005639 || HG

Funded by: National Human Genome Research Institute : NHGRI

Award ID: RC1 HG005334 || HG

Funded by: National Human Genome Research Institute : NHGRI

Award ID: R01 HG004037 || HG

Comments

Comment on this article

scite_

Cited by 95

See all cited by

Most referenced authors 756

See all reference authors

Network deconvolution as a general method to distinguish direct dependencies in networks

Read this article at

Abstract

Related collections

Genome Engineering using CRISPR

Most cited references 52

Protein 3D Structure Computed from Evolutionary Sequence Variation

Direct-coupling analysis of residue coevolution captures native contacts across many protein families.

PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments.

Author and article information

Journal

Affiliations

Author notes

Article

History

Funding

Categories

Comments

Comment on this article

Similar content 40

Cited by 95

Most referenced authors 756