5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Predicting residue‐residue distance relationships (eg, contacts) has become the key direction to advance protein structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, distance‐driven template‐free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template‐free and template‐based structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue‐residue features such as coevolution scores to substantially improve contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template‐based modeling targets. Deep learning also successfully integrated one‐dimensional structural features, two‐dimensional contact information, and three‐dimensional structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system clearly shows that protein contact distance prediction and model selection driven by deep learning holds the key of solving protein structure prediction problem. However, there are still challenges in accurately predicting protein contact distance when there are few homologous sequences, folding proteins from noisy contact distances, and ranking models of hard targets.

          Related collections

          Most cited references59

          • Record: found
          • Abstract: found
          • Article: not found

          Protein homology detection by HMM-HMM comparison.

          Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction and evolution. We have generalized the alignment of protein sequences with a profile hidden Markov model (HMM) to the case of pairwise alignment of profile HMMs. We present a method for detecting distant homologous relationships between proteins based on this approach. The method (HHsearch) is benchmarked together with BLAST, PSI-BLAST, HMMER and the profile-profile comparison tools PROF_SIM and COMPASS, in an all-against-all comparison of a database of 3691 protein domains from SCOP 1.63 with pairwise sequence identities below 20%.Sensitivity: When the predicted secondary structure is included in the HMMs, HHsearch is able to detect between 2.7 and 4.2 times more homologs than PSI-BLAST or HMMER and between 1.44 and 1.9 times more than COMPASS or PROF_SIM for a rate of false positives of 10%. Approximately half of the improvement over the profile-profile comparison methods is attributable to the use of profile HMMs in place of simple profiles. Alignment quality: Higher sensitivity is mirrored by an increased alignment quality. HHsearch produced 1.2, 1.7 and 3.3 times more good alignments ('balanced' score >0.3) than the next best method (COMPASS), and 1.6, 2.9 and 9.4 times more than PSI-BLAST, at the family, superfamily and fold level, respectively.Speed: HHsearch scans a query of 200 residues against 3691 domains in 33 s on an AMD64 2GHz PC. This is 10 times faster than PROF_SIM and 17 times faster than COMPASS.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules.

            We have recently completed a full re-architecturing of the ROSETTA molecular modeling program, generalizing and expanding its existing functionality. The new architecture enables the rapid prototyping of novel protocols by providing easy-to-use interfaces to powerful tools for molecular modeling. The source code of this rearchitecturing has been released as ROSETTA3 and is freely available for academic use. At the time of its release, it contained 470,000 lines of code. Counting currently unpublished protocols at the time of this writing, the source includes 1,285,000 lines. Its rapid growth is a testament to its ease of use. This chapter describes the requirements for our new architecture, justifies the design decisions, sketches out central classes, and highlights a few of the common tasks that the new software can perform. © 2011 Elsevier Inc. All rights reserved.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Hidden Markov model speed heuristic and iterative HMM search procedure

              Background Profile hidden Markov models (profile-HMMs) are sensitive tools for remote protein homology detection, but the main scoring algorithms, Viterbi or Forward, require considerable time to search large sequence databases. Results We have designed a series of database filtering steps, HMMERHEAD, that are applied prior to the scoring algorithms, as implemented in the HMMER package, in an effort to reduce search time. Using this heuristic, we obtain a 20-fold decrease in Forward and a 6-fold decrease in Viterbi search time with a minimal loss in sensitivity relative to the unfiltered approaches. We then implemented an iterative profile-HMM search method, JackHMMER, which employs the HMMERHEAD heuristic. Due to our search heuristic, we eliminated the subdatabase creation that is common in current iterative profile-HMM approaches. On our benchmark, JackHMMER detects 14% more remote protein homologs than SAM's iterative method T2K. Conclusions Our search heuristic, HMMERHEAD, significantly reduces the time needed to score a profile-HMM against large sequence databases. This search heuristic allowed us to implement an iterative profile-HMM search method, JackHMMER, which detects significantly more remote protein homologs than SAM's T2K and NCBI's PSI-BLAST.
                Bookmark

                Author and article information

                Contributors
                chengji@missouri.edu
                Journal
                Proteins
                Proteins
                10.1002/(ISSN)1097-0134
                PROT
                Proteins
                John Wiley & Sons, Inc. (Hoboken, USA )
                0887-3585
                1097-0134
                25 April 2019
                December 2019
                : 87
                : 12 , Critical Assessment of Methods of Protein Structure Prediction (CASP) Special Issue ( doiID: 10.1002/prot.v87.12 )
                : 1165-1178
                Affiliations
                [ 1 ] Department of Electrical Engineering and Computer Science University of Missouri Columbia Missouri
                [ 2 ] Department of Computer Science Pacific Lutheran University Tacoma Washington
                Author notes
                [*] [* ] Correspondence

                Jianlin Cheng, Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211.

                Email: chengji@ 123456missouri.edu

                Author information
                https://orcid.org/0000-0002-8345-343X
                https://orcid.org/0000-0003-0305-2853
                Article
                PROT25697
                10.1002/prot.25697
                6800999
                30985027
                857cbf37-4e92-49f0-827b-bb4db939a760
                © 2019 The Authors. Proteins: Structure, Function, and Bioinformatics published by Wiley Periodicals, Inc.

                This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

                History
                : 16 February 2019
                : 04 April 2019
                : 12 April 2019
                Page count
                Figures: 10, Tables: 0, Pages: 14, Words: 8832
                Funding
                Funded by: National Institutes of Health , open-funder-registry 10.13039/100000002;
                Award ID: R01GM093123
                Funded by: National Science Foundation , open-funder-registry 10.13039/100000001;
                Award ID: DBI1759934
                Award ID: IIS1763246
                Categories
                Research Article
                3d Structure Modeling
                Research Articles
                Custom metadata
                2.0
                December 2019
                Converter:WILEY_ML3GV2_TO_JATSPMC version:5.7.2 mode:remove_FC converted:05.12.2019

                Biochemistry
                contact prediction,deep learning,distance prediction,protein model quality assessment,protein structure prediction,template‐based modeling,template‐free modeling

                Comments

                Comment on this article