9
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Wikidata as a knowledge graph for the life sciences

      research-article
      1 , 2 , 3 , 2 , 4 , 4 , 5 , 6 , 7 , 8 , 6 , 9 , 2 , 10 , 11 , 5 , 2 , 5 , 2 , 11 , 12 , 13 , 14 , 15 , 2 , 2 , 2 ,   13 , 2 , 2 ,
      ,
      eLife
      eLife Sciences Publications, Ltd
      science forum, knowledge graphs, data mining, drug repurposing, wikidata, None

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Wikidata is a community-maintained knowledge base that has been assembled from repositories in the fields of genomics, proteomics, genetic variants, pathways, chemical compounds, and diseases, and that adheres to the FAIR principles of findability, accessibility, interoperability and reusability. Here we describe the breadth and depth of the biomedical knowledge contained within Wikidata, and discuss the open-source tools we have built to add information to Wikidata and to synchronize it with source databases. We also demonstrate several use cases for Wikidata, including the crowdsourced curation of biomedical ontologies, phenotype-based diagnosis of disease, and drug repurposing.

          Related collections

          Most cited references60

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          UniProt: a worldwide hub of protein knowledge

          (2018)
          Abstract The UniProt Knowledgebase is a collection of sequences and annotations for over 120 million proteins across all branches of life. Detailed annotations extracted from the literature by expert curators have been collected for over half a million of these proteins. These annotations are supplemented by annotations provided by rule based automated systems, and those imported from other resources. In this article we describe significant updates that we have made over the last 2 years to the resource. We have greatly expanded the number of Reference Proteomes that we provide and in particular we have focussed on improving the number of viral Reference Proteomes. The UniProt website has been augmented with new data visualizations for the subcellular localization of proteins as well as their structure and interactions. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The Pfam protein families database in 2019

            Abstract The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). The number of families has grown substantially to a total of 17,929 in release 32.0. New additions have been coupled with efforts to improve existing families, including refinement of domain boundaries, their classification into Pfam clans, as well as their functional annotation. We recently began to collaborate with the RepeatsDB resource to improve the definition of tandem repeat families within Pfam. We carried out a significant comparison to the structural classification database, namely the Evolutionary Classification of Protein Domains (ECOD) that led to the creation of 825 new families based on their set of uncharacterized families (EUFs). Furthermore, we also connected Pfam entries to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms. Since Pfam has many community contributors, we recently enabled the linking between authorship of all Pfam entries with the corresponding authors’ ORCID identifiers. This effectively permits authors to claim credit for their Pfam curation and link them to their ORCID record.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking.

              The potential of the diverse chemistries present in natural products (NP) for biotechnology and medicine remains untapped because NP databases are not searchable with raw data and the NP community has no way to share data other than in published papers. Although mass spectrometry (MS) techniques are well-suited to high-throughput characterization of NP, there is a pressing need for an infrastructure to enable sharing and curation of data. We present Global Natural Products Social Molecular Networking (GNPS; http://gnps.ucsd.edu), an open-access knowledge base for community-wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. In GNPS, crowdsourced curation of freely available community-wide reference MS libraries will underpin improved annotations. Data-driven social-networking should facilitate identification of spectra and foster collaborations. We also introduce the concept of 'living data' through continuous reanalysis of deposited data.
                Bookmark

                Author and article information

                Contributors
                Role: Reviewing Editor
                Role: Senior Editor
                Journal
                eLife
                Elife
                eLife
                eLife
                eLife Sciences Publications, Ltd
                2050-084X
                17 March 2020
                2020
                : 9
                : e52614
                Affiliations
                [1 ]Micelio AntwerpenBelgium
                [2 ]Department of Integrative Structural and Computational Biology, The Scripps Research Institute La JollaUnited States
                [3 ]Center for Integrative Bioinformatics Vienna, Max Perutz Laboratories, University of Vienna and Medical University of Vienna ViennaAustria
                [4 ]McDonnell Genome Institute, Washington University School of Medicine St. LouisUnited States
                [5 ]Institute of Data Science and Biotechnology, Gladstone Institutes San FranciscoUnited States
                [6 ]European Bioinformatics Institute (EMBL-EBI) HinxtonUnited Kingdom
                [7 ]School of Chemistry, The University of Sydney SydneyAustralia
                [8 ]Division of Allergy and Infectious Diseases, Department of Medicine, University of Washington SeattleUnited States
                [9 ]Wellcome Trust Sanger Institute CambridgeUnited Kingdom
                [10 ]School of Data Science, University of Virginia CharlottesvilleUnited States
                [11 ]University of Maryland School of Medicine BaltimoreUnited States
                [12 ]Department of Animal Plant and Soil Sciences, La Trobe University MelbourneAustralia
                [13 ]Department of Bioinformatics-BiGCaT, NUTRIM, Maastricht University MaastrichtNetherlands
                [14 ]Retired researcher BerlinGermany
                [15 ]Yale University Library, Yale University New HavenUnited States
                eLife United Kingdom
                eLife United Kingdom
                eLife United Kingdom
                United States
                Author notes
                [†]

                These authors contributed equally to this work.

                Author information
                https://orcid.org/0000-0001-9773-4008
                https://orcid.org/0000-0002-0644-7212
                https://orcid.org/0000-0003-4640-3510
                https://orcid.org/0000-0002-7334-7852
                https://orcid.org/0000-0002-6388-446X
                https://orcid.org/0000-0002-0843-4271
                https://orcid.org/0000-0001-5410-599X
                https://orcid.org/0000-0001-8479-0262
                https://orcid.org/0000-0002-3348-3622
                https://orcid.org/0000-0002-2967-3079
                https://orcid.org/0000-0002-3356-3542
                https://orcid.org/0000-0001-5916-0947
                https://orcid.org/0000-0002-7792-0150
                https://orcid.org/0000-0001-9488-1870
                https://orcid.org/0000-0003-0719-3485
                https://orcid.org/0000-0001-5706-2163
                https://orcid.org/0000-0002-4291-0737
                https://orcid.org/0000-0002-4693-0591
                https://orcid.org/0000-0003-0169-8159
                https://orcid.org/0000-0001-8910-9851
                https://orcid.org/0000-0002-2298-7593
                https://orcid.org/0000-0001-8449-1318
                https://orcid.org/0000-0002-4650-631X
                https://orcid.org/0000-0002-4499-0451
                https://orcid.org/0000-0001-9536-9115
                https://orcid.org/0000-0002-7899-1604
                https://orcid.org/0000-0001-6334-452X
                https://orcid.org/0000-0001-7542-0286
                https://orcid.org/0000-0002-2629-6124
                https://orcid.org/0000-0002-9859-4104
                Article
                52614
                10.7554/eLife.52614
                7077981
                32180547
                8985a17e-4625-40d3-b5f0-c6874586b873
                © 2020, Waagmeester et al

                This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

                History
                : 09 October 2019
                : 28 February 2020
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100000057, National Institute of General Medical Sciences;
                Award ID: R01 GM089820
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000057, National Institute of General Medical Sciences;
                Award ID: U54 GM114833
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000057, National Institute of General Medical Sciences;
                Award ID: R01 GM100039
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000051, National Human Genome Research Institute;
                Award ID: R00HG007940
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000054, National Cancer Institute;
                Award ID: U24CA237719
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100001368, V Foundation for Cancer Research;
                Award ID: V2018-007
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000060, National Institute of Allergy and Infectious Diseases;
                Award ID: R01 AI126785
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100006108, National Center for Advancing Translational Sciences;
                Award ID: UL1 TR002550
                Award Recipient :
                The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
                Categories
                Feature Article
                Computational and Systems Biology
                Science Forum
                Custom metadata
                Wikidata is continuously-updated resource that could improve the efficiency and accuracy of research in many areas of the life and biomedical sciences.
                5

                Life sciences
                science forum,knowledge graphs,data mining,drug repurposing,wikidata,none
                Life sciences
                science forum, knowledge graphs, data mining, drug repurposing, wikidata, none

                Comments

                Comment on this article