ComplexContact: a web server for inter-protein contact prediction using deep learning

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

ComplexContact ( http://raptorx2.uchicago.edu/ComplexContact/) is a web server for sequence-based interfacial residue-residue contact prediction of a putative protein complex. Interfacial residue-residue contacts are critical for understanding how proteins form complex and interact at residue level. When receiving a pair of protein sequences, ComplexContact first searches for their sequence homologs and builds two paired multiple sequence alignments (MSA), then it applies co-evolution analysis and a CASP-winning deep learning (DL) method to predict interfacial contacts from paired MSAs and visualizes the prediction as an image. The DL method was originally developed for intra-protein contact prediction and performed the best in CASP12. Our large-scale experimental test further shows that ComplexContact greatly outperforms pure co-evolution methods for inter-protein contact prediction, regardless of the species.

Related collections

Most cited references 31

Record: found
Abstract: found
Article: found

Is Open Access

Protein 3D Structure Computed from Evolutionary Sequence Variation

Debora S. Marks, Lucy Colwell, Robert Sheridan … (2011)

The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes.

0 comments Cited 443 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments.

David T. W. Jones, Daniel Buchan, Domenico Cozzetto … (2012)

The accurate prediction of residue-residue contacts, critical for maintaining the native fold of a protein, remains an open problem in the field of structural bioinformatics. Interest in this long-standing problem has increased recently with algorithmic improvements and the rapid growth in the sizes of sequence families. Progress could have major impacts in both structure and function prediction to name but two benefits. Sequence-based contact predictions are usually made by identifying correlated mutations within multiple sequence alignments (MSAs), most commonly through the information-theoretic approach of calculating mutual information between pairs of sites in proteins. These predictions are often inaccurate because the true covariation signal in the MSA is often masked by biases from many ancillary indirect-coupling or phylogenetic effects. Here we present a novel method, PSICOV, which introduces the use of sparse inverse covariance estimation to the problem of protein contact prediction. Our method builds on work which had previously demonstrated corrections for phylogenetic and entropic correlation noise and allows accurate discrimination of direct from indirectly coupled mutation correlations in the MSA. PSICOV displays a mean precision substantially better than the best performing normalized mutual information approach and Bayesian networks. For 118 out of 150 targets, the L/5 (i.e. top-L/5 predictions for a protein of length L) precision for long-range contacts (sequence separation >23) was ≥ 0.5, which represents an improvement sufficient to be of significant benefit in protein structure prediction or model quality assessment. The PSICOV source code can be downloaded from http://bioinf.cs.ucl.ac.uk/downloads/PSICOV.

0 comments Cited 347 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era.

Hetunandan Kamisetty, Sergey Ovchinnikov, David. Baker (2013)

Recently developed methods have shown considerable promise in predicting residue-residue contacts in protein 3D structures using evolutionary covariance information. However, these methods require large numbers of evolutionarily related sequences to robustly assess the extent of residue covariation, and the larger the protein family, the more likely that contact information is unnecessary because a reasonable model can be built based on the structure of a homolog. Here we describe a method that integrates sequence coevolution and structural context information using a pseudolikelihood approach, allowing more accurate contact predictions from fewer homologous sequences. We rigorously assess the utility of predicted contacts for protein structure prediction using large and representative sequence and structure databases from recent structure prediction experiments. We find that contact predictions are likely to be accurate when the number of aligned sequences (with sequence redundancy reduced to 90%) is greater than five times the length of the protein, and that accurate predictions are likely to be useful for structure modeling if the aligned sequences are more similar to the protein of interest than to the closest homolog of known structure. These conditions are currently met by 422 of the protein families collected in the Pfam database.

0 comments Cited 294 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Nucleic Acids Res

Journal ID (iso-abbrev): Nucleic Acids Res

Journal ID (publisher-id): nar

Title: Nucleic Acids Research

Publisher: Oxford University Press

ISSN (Print): 0305-1048

ISSN (Electronic): 1362-4962

Publication date (Print): 02 July 2018

Publication date (Electronic): 22 May 2018

Publication date PMC-release: 22 May 2018

Volume: 46

Issue: Web Server issue

Pages: W432-W437

Affiliations

[1 ]School of Computer Science and Technology, Hangzhou Dianzi University, China

[2 ]King Abdullah University of Science and Technology (KAUST), Saudi Arabia

[3 ]Toyota Technological Institute at Chicago, USA

[4 ]Institute for Interdisciplinary Information Sciences, Tsinghua University, China

Author notes

To whom correspondence should be addressed. Email: jinboxu@ 123456gmail.com . Correspondence may also be addressed to Qing Wu. Email: wuqing@ 123456hdu.edu.cn

The authors wish it to be known that, in their opinion, the first three authors should be regarded as Joint First Authors.

Article

Publisher ID: gky420

DOI: 10.1093/nar/gky420

PMC ID: 6030867

PubMed ID: 29790960

SO-VID: 5819c776-c7f7-4a31-a2cd-7e34b8b08fb6

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@ 123456oup.com

History

Date accepted : 20 May 2018

Date revision received : 22 April 2018

Date received : 13 February 2018

Page count

Pages: 6

Funding

Funded by: National Institutes of Health 10.13039/100000002

Award ID: R01GM089753

Funded by: National Science Foundation 10.13039/100000001

Award ID: DBI-1564955

Comments

Comment on this article

scite_

Cited by 50

See all cited by

Most referenced authors 1,225

See all reference authors

ComplexContact: a web server for inter-protein contact prediction using deep learning

Read this article at

Abstract

Related collections

Genome Engineering using CRISPR

Most cited references 31

Protein 3D Structure Computed from Evolutionary Sequence Variation

PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments.

Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era.

Author and article information

Journal

Affiliations

Author notes

Article

History

Page count

Funding

Categories

Comments

Comment on this article

Similar content 131

Cited by 50

Most referenced authors 1,225