Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Predicting residue‐residue distance relationships (eg, contacts) has become the key direction to advance protein structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, distance‐driven template‐free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template‐free and template‐based structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue‐residue features such as coevolution scores to substantially improve contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template‐based modeling targets. Deep learning also successfully integrated one‐dimensional structural features, two‐dimensional contact information, and three‐dimensional structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system clearly shows that protein contact distance prediction and model selection driven by deep learning holds the key of solving protein structure prediction problem. However, there are still challenges in accurately predicting protein contact distance when there are few homologous sequences, folding proteins from noisy contact distances, and ranking models of hard targets.

Related collections

Most cited references 59

Record: found
Abstract: found
Article: not found

Protein homology detection by HMM-HMM comparison.

Johannes Söding (2005)

Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction and evolution. We have generalized the alignment of protein sequences with a profile hidden Markov model (HMM) to the case of pairwise alignment of profile HMMs. We present a method for detecting distant homologous relationships between proteins based on this approach. The method (HHsearch) is benchmarked together with BLAST, PSI-BLAST, HMMER and the profile-profile comparison tools PROF_SIM and COMPASS, in an all-against-all comparison of a database of 3691 protein domains from SCOP 1.63 with pairwise sequence identities below 20%.Sensitivity: When the predicted secondary structure is included in the HMMs, HHsearch is able to detect between 2.7 and 4.2 times more homologs than PSI-BLAST or HMMER and between 1.44 and 1.9 times more than COMPASS or PROF_SIM for a rate of false positives of 10%. Approximately half of the improvement over the profile-profile comparison methods is attributable to the use of profile HMMs in place of simple profiles. Alignment quality: Higher sensitivity is mirrored by an increased alignment quality. HHsearch produced 1.2, 1.7 and 3.3 times more good alignments ('balanced' score >0.3) than the next best method (COMPASS), and 1.6, 2.9 and 9.4 times more than PSI-BLAST, at the family, superfamily and fold level, respectively.Speed: HHsearch scans a query of 200 residues against 3691 domains in 33 s on an AMD64 2GHz PC. This is 10 times faster than PROF_SIM and 17 times faster than COMPASS.

0 comments Cited 961 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules.

Andrew Leaver-Fay, Michael Tyka, Steven M. Lewis … (2011)

We have recently completed a full re-architecturing of the ROSETTA molecular modeling program, generalizing and expanding its existing functionality. The new architecture enables the rapid prototyping of novel protocols by providing easy-to-use interfaces to powerful tools for molecular modeling. The source code of this rearchitecturing has been released as ROSETTA3 and is freely available for academic use. At the time of its release, it contained 470,000 lines of code. Counting currently unpublished protocols at the time of this writing, the source includes 1,285,000 lines. Its rapid growth is a testament to its ease of use. This chapter describes the requirements for our new architecture, justifies the design decisions, sketches out central classes, and highlights a few of the common tasks that the new software can perform. © 2011 Elsevier Inc. All rights reserved.

0 comments Cited 614 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Hidden Markov model speed heuristic and iterative HMM search procedure

L Steven Johnson, Sean R. Eddy, Elon Portugaly (2010)

Background Profile hidden Markov models (profile-HMMs) are sensitive tools for remote protein homology detection, but the main scoring algorithms, Viterbi or Forward, require considerable time to search large sequence databases. Results We have designed a series of database filtering steps, HMMERHEAD, that are applied prior to the scoring algorithms, as implemented in the HMMER package, in an effort to reduce search time. Using this heuristic, we obtain a 20-fold decrease in Forward and a 6-fold decrease in Viterbi search time with a minimal loss in sensitivity relative to the unfiltered approaches. We then implemented an iterative profile-HMM search method, JackHMMER, which employs the HMMERHEAD heuristic. Due to our search heuristic, we eliminated the subdatabase creation that is common in current iterative profile-HMM approaches. On our benchmark, JackHMMER detects 14% more remote protein homologs than SAM's iterative method T2K. Conclusions Our search heuristic, HMMERHEAD, significantly reduces the time needed to score a profile-HMM against large sequence databases. This search heuristic allowed us to implement an iterative profile-HMM search method, JackHMMER, which detects significantly more remote protein homologs than SAM's T2K and NCBI's PSI-BLAST.

0 comments Cited 488 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Jianlin Cheng:

ORCID: https://orcid.org/0000-0003-0305-2853

chengji@missouri.edu

Journal

Journal ID (nlm-ta): Proteins

Journal ID (iso-abbrev): Proteins

Journal ID (doi): 10.1002/(ISSN)1097-0134

Journal ID (publisher-id): PROT

Title: Proteins

Publisher: John Wiley & Sons, Inc. (Hoboken, USA )

ISSN (Print): 0887-3585

ISSN (Electronic): 1097-0134

Publication date (Electronic): 25 April 2019

Publication date (Print): December 2019

Volume: 87

Issue: 12 , Critical Assessment of Methods of Protein Structure Prediction (CASP) Special Issue ( doiID: 10.1002/prot.v87.12 )

Pages: 1165-1178

Affiliations

[ ¹ ] Department of Electrical Engineering and Computer Science University of Missouri Columbia Missouri

[ ² ] Department of Computer Science Pacific Lutheran University Tacoma Washington

Author notes

[*] [* ] Correspondence

Jianlin Cheng, Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211.

Email: chengji@ 123456missouri.edu

Author information

Renzhi Cao https://orcid.org/0000-0002-8345-343X

Jianlin Cheng https://orcid.org/0000-0003-0305-2853

Article

Publisher ID: PROT25697

DOI: 10.1002/prot.25697

PMC ID: 6800999

PubMed ID: 30985027

SO-VID: 857cbf37-4e92-49f0-827b-bb4db939a760

License:

This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

History

Date received : 16 February 2019

Date revision received : 04 April 2019

Date accepted : 12 April 2019

Page count

Figures: 10, Tables: 0, Pages: 14, Words: 8832

Funding

Funded by: National Institutes of Health , open-funder-registry 10.13039/100000002;

Award ID: R01GM093123

Funded by: National Science Foundation , open-funder-registry 10.13039/100000001;

Award ID: DBI1759934

Award ID: IIS1763246

Custom metadata

source-schema-version-number 2.0

cover-date December 2019

details-of-publishers-convertor Converter:WILEY_ML3GV2_TO_JATSPMC version:5.7.2 mode:remove_FC converted:05.12.2019

ScienceOpen disciplines: Biochemistry

Keywords: contact prediction,deep learning,distance prediction,protein model quality assessment,protein structure prediction,template‐based modeling,template‐free modeling

Data availability:

ScienceOpen disciplines: Biochemistry

Keywords: contact prediction, deep learning, distance prediction, protein model quality assessment, protein structure prediction, template‐based modeling, template‐free modeling

Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13

Read this article at

Abstract

Related collections

Journal of Circulating Biomarkers

Most cited references 59

Protein homology detection by HMM-HMM comparison.

ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules.

Hidden Markov model speed heuristic and iterative HMM search procedure

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 90

Cited by 55

Most referenced authors 1,652