62
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The AlphaFold Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) is an openly accessible, extensive database of high-accuracy protein-structure predictions. Powered by AlphaFold v2.0 of DeepMind, it has enabled an unprecedented expansion of the structural coverage of the known protein-sequence space. AlphaFold DB provides programmatic access to and interactive visualization of predicted atomic coordinates, per-residue and pairwise model-confidence estimates and predicted aligned errors. The initial release of AlphaFold DB contains over 360,000 predicted structures across 21 model-organism proteomes, which will soon be expanded to cover most of the (over 100 million) representative sequences from the UniRef90 data set.

          Related collections

          Most cited references20

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Highly accurate protein structure prediction with AlphaFold

          Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort 1 – 4 , the structures of around 100,000 unique proteins have been determined 5 , but this represents a small fraction of the billions of known protein sequences 6 , 7 . Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’ 8 —has been an important open research problem for more than 50 years 9 . Despite recent progress 10 – 14 , existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14) 15 , demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm. AlphaFold predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            UniProt: the universal protein knowledgebase in 2021

            (2020)
            Abstract The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Pfam: The protein families database in 2021

              Abstract The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the SARS-CoV-2 proteome, and built new entries for regions that were not covered by Pfam. We have reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family. The new Pfam-B is based on a clustering by the MMseqs2 software. We have compared all of the regions in the RepeatsDB to those in Pfam and have started to use the results to build and refine Pfam repeat families. Pfam is freely available for browsing and download at http://pfam.xfam.org/.
                Bookmark

                Author and article information

                Contributors
                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                07 January 2022
                17 November 2021
                17 November 2021
                : 50
                : D1
                : D439-D444
                Affiliations
                European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton, UK
                European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton, UK
                European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton, UK
                European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton, UK
                European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton, UK
                European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton, UK
                European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton, UK
                European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton, UK
                European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton, UK
                DeepMind , London, UK
                DeepMind , London, UK
                DeepMind , London, UK
                DeepMind , London, UK
                DeepMind , London, UK
                DeepMind , London, UK
                DeepMind , London, UK
                DeepMind , London, UK
                DeepMind , London, UK
                DeepMind , London, UK
                DeepMind , London, UK
                DeepMind , London, UK
                DeepMind , London, UK
                DeepMind , London, UK
                European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton, UK
                European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton, UK
                DeepMind , London, UK
                European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton, UK
                Author notes
                To whom correspondence should be addressed. Tel: +44 1223 49 4646; Fax: +44 1223 49 4468; Email: sameer@ 123456ebi.ac.uk
                Correspondence may also be addressed to Demis Hassabis. Email: dhcontact@ 123456deepmind.com
                Author information
                https://orcid.org/0000-0002-3687-0839
                https://orcid.org/0000-0002-9043-7665
                https://orcid.org/0000-0001-8314-8497
                https://orcid.org/0000-0002-8439-5964
                Article
                gkab1061
                10.1093/nar/gkab1061
                8728224
                34791371
                aedf36e1-0923-4052-9f78-b2df76409b7f
                © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 19 October 2021
                : 14 October 2021
                : 07 September 2021
                Page count
                Pages: 6
                Funding
                Funded by: DeepMind, DOI 10.13039/100017149;
                Categories
                AcademicSubjects/SCI00010
                NAR Breakthrough Article

                Genetics
                Genetics

                Comments

                Comment on this article