0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      How much metagenome data is needed for protein structure prediction: The advantages of targeted approach from the ecological and evolutionary perspectives

      review-article
      1 , 1 ,
      iMeta
      John Wiley and Sons Inc.
      ecology, evolution, metagenome data, protein 3D structure modeling, targeted approach

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          It has been proven that three‐dimensional protein structures could be modeled by supplementing homologous sequences with metagenome sequences. Even though a large volume of metagenome data is utilized for such purposes, a significant proportion of proteins remain unsolved. In this review, we focus on identifying ecological and evolutionary patterns in metagenome data, decoding the complicated relationships of these patterns with protein structures, and investigating how these patterns can be effectively used to improve protein structure prediction. First, we proposed the metagenome utilization efficiency and marginal effect model to quantify the divergent distribution of homologous sequences for the protein family. Second, we proposed that the targeted approach effectively identifies homologous sequences from specified biomes compared with the untargeted approach's blind search. Finally, we determined the lower bound for metagenome data required for predicting all the protein structures in the Pfam database and showed that the present metagenome data is insufficient for this purpose. In summary, we discovered ecological and evolutionary patterns in the metagenome data that may be used to predict protein structures effectively. The targeted approach is promising in terms of effectively extracting homologous sequences and predicting protein structures using these patterns.

          Abstract

          For protein 3D structure prediction, we mine the data‐dependent ecological and evolutionary trends hidden in metagenome data. Based on this pattern, the targeted approach was presented to predict the protein 3D structure more effectively and accurately than the untargeted approach's blind search.

          Highlights

          • Metagenome benefits for homologous sequence supplement for protein three‐dimensional (3D) structure prediction.

          • Metagenome utilization efficiency shows a divergent distribution of proteins.

          • Marginal effect model also quantifies this divergent distribution of proteins.

          • For mining homologous sequences, the targeted approach outperforms the untargeted approach.

          • Current metagenome data is not enough for modeling 3D structures for all proteins.

          Related collections

          Most cited references87

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Highly accurate protein structure prediction with AlphaFold

          Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort 1 – 4 , the structures of around 100,000 unique proteins have been determined 5 , but this represents a small fraction of the billions of known protein sequences 6 , 7 . Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’ 8 —has been an important open research problem for more than 50 years 9 . Despite recent progress 10 – 14 , existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14) 15 , demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm. AlphaFold predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Pfam: The protein families database in 2021

            Abstract The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the SARS-CoV-2 proteome, and built new entries for regions that were not covered by Pfam. We have reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family. The new Pfam-B is based on a clustering by the MMseqs2 software. We have compared all of the regions in the RepeatsDB to those in Pfam and have started to use the results to build and refine Pfam repeat families. Pfam is freely available for browsing and download at http://pfam.xfam.org/.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The Pfam protein families database in 2019

              Abstract The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). The number of families has grown substantially to a total of 17,929 in release 32.0. New additions have been coupled with efforts to improve existing families, including refinement of domain boundaries, their classification into Pfam clans, as well as their functional annotation. We recently began to collaborate with the RepeatsDB resource to improve the definition of tandem repeat families within Pfam. We carried out a significant comparison to the structural classification database, namely the Evolutionary Classification of Protein Domains (ECOD) that led to the creation of 825 new families based on their set of uncharacterized families (EUFs). Furthermore, we also connected Pfam entries to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms. Since Pfam has many community contributors, we recently enabled the linking between authorship of all Pfam entries with the corresponding authors’ ORCID identifiers. This effectively permits authors to claim credit for their Pfam curation and link them to their ORCID record.
                Bookmark

                Author and article information

                Contributors
                ningkang@hust.edu.cn
                Journal
                Imeta
                Imeta
                10.1002/(ISSN)2770-596X
                IMT2
                iMeta
                John Wiley and Sons Inc. (Hoboken )
                2770-5986
                2770-596X
                06 March 2022
                March 2022
                : 1
                : 1 ( doiID: 10.1002/imt2.v1.1 )
                : e9
                Affiliations
                [ 1 ] Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular‐Imaging, Department of Bioinformatics and Systems Biology Center of AI Biology, College of Life Science and Technology, Huazhong University of Science and Technology Wuhan Hubei China
                Author notes
                [*] [* ] Correspondence Kang Ning, Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular‐Imaging, Department of Bioinformatics and Systems Biology, Center of AI Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074 Hubei, China.

                Email: ningkang@ 123456hust.edu.cn

                Author information
                http://orcid.org/0000-0002-2757-3584
                http://orcid.org/0000-0003-3325-5387
                Article
                IMT29
                10.1002/imt2.9
                10989767
                38867727
                93b4b31a-8d7b-4a70-b8c8-3969c1a27c82
                © 2022 The Authors. iMeta published by John Wiley & Sons Australia, Ltd on behalf of iMeta Science.

                This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

                History
                : 23 December 2021
                : 07 December 2021
                : 04 January 2022
                Page count
                Figures: 7, Tables: 1, Pages: 16, Words: 9044
                Categories
                Review Article
                Review Articles
                Custom metadata
                2.0
                March 2022
                Converter:WILEY_ML3GV2_TO_JATSPMC version:6.4.0 mode:remove_FC converted:25.03.2024

                ecology,evolution,metagenome data,protein 3d structure modeling,targeted approach

                Comments

                Comment on this article