Inviting an author to review:
Find an author and click ‘Invite to review selected article’ near their name.
Search for authorsSearch for similar articles
2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Annotation of biologically relevant ligands in UniProtKB using ChEBI

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation

          To provide high quality, computationally tractable annotation of binding sites for biologically relevant (cognate) ligands in UniProtKB using the chemical ontology ChEBI (Chemical Entities of Biological Interest), to better support efforts to study and predict functionally relevant interactions between protein sequences and structures and small molecule ligands.

          Results

          We structured the data model for cognate ligand binding site annotations in UniProtKB and performed a complete reannotation of all cognate ligand binding sites using stable unique identifiers from ChEBI, which we now use as the reference vocabulary for all such annotations. We developed improved search and query facilities for cognate ligands in the UniProt website, REST API and SPARQL endpoint that leverage the chemical structure data, nomenclature and classification that ChEBI provides.

          Availability and implementation

          Binding site annotations for cognate ligands described using ChEBI are available for UniProtKB protein sequence records in several formats (text, XML and RDF) and are freely available to query and download through the UniProt website ( www.uniprot.org), REST API ( www.uniprot.org/help/api), SPARQL endpoint ( sparql.uniprot.org/) and FTP site ( https://ftp.uniprot.org/pub/databases/uniprot/).

          Supplementary information

          Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references36

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Highly accurate protein structure prediction with AlphaFold

          Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort 1 – 4 , the structures of around 100,000 unique proteins have been determined 5 , but this represents a small fraction of the billions of known protein sequences 6 , 7 . Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’ 8 —has been an important open research problem for more than 50 years 9 . Despite recent progress 10 – 14 , existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14) 15 , demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm. AlphaFold predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            UniProt: the universal protein knowledgebase in 2021

            (2020)
            Abstract The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models

              The AlphaFold Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk ) is an openly accessible, extensive database of high-accuracy protein-structure predictions. Powered by AlphaFold v2.0 of DeepMind, it has enabled an unprecedented expansion of the structural coverage of the known protein-sequence space. AlphaFold DB provides programmatic access to and interactive visualization of predicted atomic coordinates, per-residue and pairwise model-confidence estimates and predicted aligned errors. The initial release of AlphaFold DB contains over 360,000 predicted structures across 21 model-organism proteomes, which will soon be expanded to cover most of the (over 100 million) representative sequences from the UniRef90 data set.
                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                January 2023
                09 December 2022
                09 December 2022
                : 39
                : 1
                : btac793
                Affiliations
                Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire , 1211 Geneva 4, Switzerland
                Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire , 1211 Geneva 4, Switzerland
                Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire , 1211 Geneva 4, Switzerland
                Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire , 1211 Geneva 4, Switzerland
                Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire , 1211 Geneva 4, Switzerland
                Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire , 1211 Geneva 4, Switzerland
                Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire , 1211 Geneva 4, Switzerland
                Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire , 1211 Geneva 4, Switzerland
                Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire , 1211 Geneva 4, Switzerland
                Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire , 1211 Geneva 4, Switzerland
                European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI) , Hinxton, Cambridgeshire CB10 1SD, UK
                Protein Information Resource, University of Delaware , Newark, DE 19711, USA
                Protein Information Resource, Georgetown University Medical Center , Washington, DC 20007, USA
                Author notes
                To whom correspondence should be addressed. alan.bridge@ 123456sib.swiss
                Author information
                https://orcid.org/0000-0001-8314-404X
                https://orcid.org/0000-0003-1608-9954
                https://orcid.org/0000-0001-5436-7383
                https://orcid.org/0000-0003-4543-637X
                https://orcid.org/0000-0002-4348-0070
                https://orcid.org/0000-0001-8890-2268
                https://orcid.org/0000-0003-2148-9135
                Article
                btac793
                10.1093/bioinformatics/btac793
                9825770
                36484697
                e52b6743-83e9-4abf-9906-3391b31c8e41
                © The Author(s) 2022. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 19 August 2022
                : 09 November 2022
                : 06 December 2022
                : 08 December 2022
                : 21 December 2022
                Page count
                Pages: 5
                Funding
                Funded by: National Eye Institute, DOI 10.13039/100000053;
                Funded by: National Human Genome Research Institute, DOI 10.13039/100000051;
                Categories
                Original Paper
                Databases and Ontologies
                AcademicSubjects/SCI01060

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article