6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.

          Related collections

          Most cited references35

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          UniProt: a worldwide hub of protein knowledge

          (2018)
          Abstract The UniProt Knowledgebase is a collection of sequences and annotations for over 120 million proteins across all branches of life. Detailed annotations extracted from the literature by expert curators have been collected for over half a million of these proteins. These annotations are supplemented by annotations provided by rule based automated systems, and those imported from other resources. In this article we describe significant updates that we have made over the last 2 years to the resource. We have greatly expanded the number of Reference Proteomes that we provide and in particular we have focussed on improving the number of viral Reference Proteomes. The UniProt website has been augmented with new data visualizations for the subcellular localization of proteins as well as their structure and interactions. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The Pfam protein families database in 2019

            Abstract The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). The number of families has grown substantially to a total of 17,929 in release 32.0. New additions have been coupled with efforts to improve existing families, including refinement of domain boundaries, their classification into Pfam clans, as well as their functional annotation. We recently began to collaborate with the RepeatsDB resource to improve the definition of tandem repeat families within Pfam. We carried out a significant comparison to the structural classification database, namely the Evolutionary Classification of Protein Domains (ECOD) that led to the creation of 825 new families based on their set of uncharacterized families (EUFs). Furthermore, we also connected Pfam entries to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms. Since Pfam has many community contributors, we recently enabled the linking between authorship of all Pfam entries with the corresponding authors’ ORCID identifiers. This effectively permits authors to claim credit for their Pfam curation and link them to their ORCID record.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              InterPro in 2019: improving coverage, classification and access to protein sequence annotations

              Abstract The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.
                Bookmark

                Author and article information

                Contributors
                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                08 January 2021
                25 November 2020
                25 November 2020
                : 49
                : D1
                : D452-D457
                Affiliations
                Dept. of Biomedical Sciences, University of Padua , Via Ugo Bassi 58/B, Padua 35121, Italy
                Dept. of Biomedical Sciences, University of Padua , Via Ugo Bassi 58/B, Padua 35121, Italy
                Dept. of Biomedical Sciences, University of Padua , Via Ugo Bassi 58/B, Padua 35121, Italy
                Dept. of Biomedical Sciences, University of Padua , Via Ugo Bassi 58/B, Padua 35121, Italy
                Dept. of Biomedical Sciences, University of Padua , Via Ugo Bassi 58/B, Padua 35121, Italy
                Dept. of Biomedical Sciences, University of Padua , Via Ugo Bassi 58/B, Padua 35121, Italy
                Dept. of Biomedical Sciences, University of Padua , Via Ugo Bassi 58/B, Padua 35121, Italy
                IBBM-CONICET, Dept. of Biological Sciences, La Plata National University , 49 y 115, 1900 La Plata, Argentina
                IBBM-CONICET, Dept. of Biological Sciences, La Plata National University , 49 y 115, 1900 La Plata, Argentina
                IBBM-CONICET, Dept. of Biological Sciences, La Plata National University , 49 y 115, 1900 La Plata, Argentina
                Dept. of Science and Technology, National University of Quilmes , Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
                Dept. of Science and Technology, National University of Quilmes , Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
                Dept. of Science and Technology, National University of Quilmes , Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
                Dept. of Science and Technology, National University of Quilmes , Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
                Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz , Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
                Dept. of Science and Technology, National University of Quilmes , Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
                Dept. of Science and Technology, National University of Quilmes , Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
                Dept. of Science and Technology, National University of Quilmes , Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
                IBBM-CONICET, Dept. of Biological Sciences, La Plata National University , 49 y 115, 1900 La Plata, Argentina
                Dept. of Engineering, Faculty of Science and Engineering, Pontifical Catholic University of Peru , Av. Universitaria 1801 San Miguel, Lima 32, Lima, Peru
                Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz , Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
                Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237, CNRS, Univ. Montpellier , Montpellier, France
                Dept. of Biomedical Sciences, University of Padua , Via Ugo Bassi 58/B, Padua 35121, Italy
                Author notes
                To whom correspondence should be addressed. Tel: +39 049 827 6269; Email: silvio.tosatto@ 123456unipd.it
                Author information
                http://orcid.org/0000-0003-0011-9397
                http://orcid.org/0000-0001-8210-2390
                http://orcid.org/0000-0003-1691-8425
                http://orcid.org/0000-0003-0362-8218
                http://orcid.org/0000-0001-6650-1711
                http://orcid.org/0000-0002-2342-6886
                http://orcid.org/0000-0003-4525-7793
                Article
                gkaa1097
                10.1093/nar/gkaa1097
                7778985
                33237313
                e7848bc2-ff22-4e39-8679-6b0a286b80cb
                © The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 19 November 2020
                : 17 October 2020
                : 15 September 2020
                Page count
                Pages: 6
                Funding
                Funded by: Marie Skłodowska-Curie;
                Award ID: 823886
                Categories
                AcademicSubjects/SCI00010
                Database Issue

                Genetics
                Genetics

                Comments

                Comment on this article