60
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data

      other
        1 , * , 2 , 3 , 2 , 1 , 2 , 2 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 2 , 12 , 2 , 2 , 13 , 2 , 10 , 2 , 2 , 2 ,   14 , 15 , 16 , 7 , 7 , 17 , 18 , 19 ,   6 , 6 , 20 , 21 , 6 , 2 , 1 , 22 , 6 , 21 , 1 , 2
      PLoS Biology
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.

          Related collections

          Most cited references16

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Identifiers.org and MIRIAM Registry: community resources to provide persistent identification

          The Minimum Information Required in the Annotation of Models Registry (http://www.ebi.ac.uk/miriam) provides unique, perennial and location-independent identifiers for data used in the biomedical domain. At its core is a shared catalogue of data collections, for each of which an individual namespace is created, and extensive metadata recorded. This namespace allows the generation of Uniform Resource Identifiers (URIs) to uniquely identify any record in a collection. Moreover, various services are provided to facilitate the creation and resolution of the identifiers. Since its launch in 2005, the system has evolved in terms of the structure of the identifiers provided, the software infrastructure, the number of data collections recorded, as well as the scope of the Registry itself. We describe here the new parallel identification scheme and the updated supporting software infrastructure. We also introduce the new Identifiers.org service (http://identifiers.org) that is built upon the information stored in the Registry and which provides directly resolvable identifiers, in the form of Uniform Resource Locators (URLs). The flexibility of the identification scheme and resolving system allows its use in many different fields, where unambiguous and perennial identification of data entities are necessary.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Gene name errors are widespread in the scientific literature

            The spreadsheet software Microsoft Excel, when used with default settings, is known to convert gene names to dates and floating-point numbers. A programmatic scan of leading genomics journals reveals that approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions. Electronic supplementary material The online version of this article (doi:10.1186/s13059-016-1044-7) contains supplementary material, which is available to authorized users.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot

              The emergence of the web has fundamentally affected most aspects of information communication, including scholarly communication. The immediacy that characterizes publishing information to the web, as well as accessing it, allows for a dramatic increase in the speed of dissemination of scholarly knowledge. But, the transition from a paper-based to a web-based scholarly communication system also poses challenges. In this paper, we focus on reference rot, the combination of link rot and content drift to which references to web resources included in Science, Technology, and Medicine (STM) articles are subject. We investigate the extent to which reference rot impacts the ability to revisit the web context that surrounds STM articles some time after their publication. We do so on the basis of a vast collection of articles from three corpora that span publication years 1997 to 2012. For over one million references to web resources extracted from over 3.5 million articles, we determine whether the HTTP URI is still responsive on the live web and whether web archives contain an archived snapshot representative of the state the referenced resource had at the time it was referenced. We observe that the fraction of articles containing references to web resources is growing steadily over time. We find one out of five STM articles suffering from reference rot, meaning it is impossible to revisit the web context that surrounds them some time after their publication. When only considering STM articles that contain references to web resources, this fraction increases to seven out of ten. We suggest that, in order to safeguard the long-term integrity of the web-based scholarly record, robust solutions to combat the reference rot problem are required. In conclusion, we provide a brief insight into the directions that are explored with this regard in the context of the Hiberlink project.
                Bookmark

                Author and article information

                Journal
                PLoS Biol
                PLoS Biol
                plos
                plosbiol
                PLoS Biology
                Public Library of Science (San Francisco, CA USA )
                1544-9173
                1545-7885
                29 June 2017
                June 2017
                29 June 2017
                : 15
                : 6
                : e2001414
                Affiliations
                [1 ]Department of Medical Informatics and Epidemiology and OHSU Library, Oregon Health & Science University, Portland, Oregon, United States of America
                [2 ]European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
                [3 ]ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
                [4 ]Berkeley Natural History Museums, University of California at Berkeley, Berkely, California, United States of America
                [5 ]Institute of Data Science, Maastricht University, Maastricht, the Netherlands
                [6 ]School of Computer Science, The University of Manchester, Manchester, United Kingdom
                [7 ]Oxford e-Research Centre, University of Oxford, Oxford, United Kingdom
                [8 ]Institute of Experimental Genetics, Helmholtz Centre Munich, German Research Center for Environmental Health, Neuherberg, Germany
                [9 ]Center for Research in Biological Systems, University of California San Diego, La Jolla, California, United States of America
                [10 ]Babraham Institute, Cambridge, United Kingdom
                [11 ]European Molecular Biology Laboratory, Heidelberg, Germany
                [12 ]Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
                [13 ]California Digital Library, Oakland, California, United States of America
                [14 ]Science and Technology Facilities Council, Daresbury Laboratory, Warrington, United Kingdom
                [15 ]Genomics Coordination Center, Department of Genetics, University Medical Center Groningen and Groningen Bioinformatics Center, University of Groningen, Groningen, the Netherlands
                [16 ]Scientific Databases and Visualization at Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
                [17 ]Institute for Medical Informatics, Bern University of Applied Sciences, Engineering and Information Technology, Bern, Switzerland
                [18 ]Manchester Institute of Biology, University of Manchester, Manchester, United Kingdom
                [19 ]Department of Biochemistry, Stellenbosch University, Stellenbosch, South Africa
                [20 ]Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals, University of Manchester, Manchester, United Kingdom
                [21 ]Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
                [22 ]Leiden Institute of Advanced Computer Science, Leiden University, Leiden, the Netherlands
                Author notes

                The authors have declared that no competing interests exist.

                Author information
                http://orcid.org/0000-0002-9353-5498
                Article
                pbio.2001414
                10.1371/journal.pbio.2001414
                5490878
                28662064
                4deea3dd-22ce-4e77-abd1-74ed56f93703
                © 2017 McMurry et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                Page count
                Figures: 6, Tables: 3, Pages: 18
                Funding
                NIH https://taggs.hhs.gov/Detail/AwardDetail?arg_AwardNum=R24OD011883&arg_ProgOfficeCode=205 (grant number R24OD011883 “Monarch Initiative”). Received by JA McMurry, CJ Mungall, MA Haendel, NL Washington. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. NIH https://taggs.hhs.gov/Detail/AwardDetail?arg_AwardNum=U41HG007822&arg_ProgOfficeCode=55 (grant number U41HG007822 “UniProt”). Received by MJ Martin. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. NIH https://taggs.hhs.gov/Detail/AwardDetail?arg_AwardNum=U24AI117966&arg_ProgOfficeCode=104 (grant number U24AI117966 “bioCADDIEfor”). Received by SA Sansone, A Gonzalez-Beltran, P Rocca-Serra, J McMurry, J Grethe, L Winfree, C Mungall, T Conlin, M Dumontier. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. NIH https://taggs.hhs.gov/Detail/AwardDetail?arg_AwardNum=U54AI117925&arg_ProgOfficeCode=104 (grant number U54AI117925 “CEDAR”). Received by M Dumontier, SA Sansone, A Gonzalez-Beltran, P Rocca-Serra. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. NIH https://taggs.hhs.gov/Detail/AwardDetail?arg_AwardNum=P41HG002273&arg_ProgOfficeCode=55 (grant number NHGRI P41HG002273-09 “Gene Ontology Consortium”). Received by CJ Mungall. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Department of Energy Received from the Director, Office of Science, Office of Basic Energy Sciences http://science.energy.gov/bso/contract-management/ (grant number DE-AC02-05CH11231). Received by CJ Mungall, NL Washington. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The Drug Disease Model Resources http://www.imi.europa.eu/content/ddmore (grant number 115156 “Innovative Medicines Initiative”). Received by C Laibe. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The European Commission http://cordis.europa.eu/projects/675728 (grant number 675728 “BioMedBridges project”). Received by JA McMurry, T Burdett, N Juty, S Jupp, C Morris. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The European Commission http://cordis.europa.eu/projects/312455 (grant number 312455 “Infrastructure for Systems Biology—Europe ISBE”). Received by N Juty, H Hermjakob, C Goble. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The European Commission http://cordis.europa.eu/projects/654248 (grant number 654248 “Coordinated Research Infrastructures Building Enduring Life-science services”). Received by C Goble. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The European Commission http://cordis.europa.eu/projects/601043 (grant number 601043 “DIACHRONfor”). Received by S Jupp. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The European Commission https://www.elixir-europe.org/about-us/how-funded (grant number “ELIXIR core funding”). Received by N Blomberg, R Jimenez. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. BBSRC http://www.bbsrc.ac.uk/research/grants-search/AwardDetails/?FundingReference=BB/L005069/1 (grant number BB/L005069/1 “ELIXIR-UK, Oxford”). Received by SA Sansone, A Gonzalez-Beltran, P Rocca-Serra. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. BBSRC http://www.bbsrc.ac.uk/research/grants-search/AwardDetails/?FundingReference=BB/M013189/1 (grant number BB/M013189/1 “DMM Core”). Received by C Goble, J Snoep, N Stanford. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. BBSRC http://www.bbsrc.ac.uk/research/grants-search/AwardDetails/?FundingReference=BB/K019783/1 (grant number BB/K019783/1 “Continued development of ChEBIfor”). Received by N Swainston. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. BBSRC http://www.bbsrc.ac.uk/research/grants-search/AwardDetails/?FundingReference=BBS/E/B/000C0419 (grant number BBS/E/B/000C0419 “A systems approach to understanding lipid, Ca2+ and MAPK signalling networks”). Received by N Le Novère. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. BBSRC http://www.bbsrc.ac.uk/research/grants-search/AwardDetails/?FundingReference=BB/M006891/1 (grant number BB/M006891/1 “EMPATHY”). Received by N Swainston. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. BBSRC http://www.bbsrc.ac.uk/research/grants-search/AwardDetails/?FundingReference=BB/M017702/1 (grant number BB/M017702/1 “SYNBIOCHEM”). Received by N Swainson, A Williams, D Fellows. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. BBSRC http://www.bbsrc.ac.uk/research/grants-search/AwardDetails/?FundingReference=BB/L005050/1 (grant number BB/L005050/1 “ELIXIR-UK, Manchester”). Received by SA Sansone, A Gonzalez-Beltran, C Goble. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Perspective
                Computer and Information Sciences
                Computer Networks
                Internet
                Research and Analysis Methods
                Research Facilities
                Information Centers
                Archives
                Biology and Life Sciences
                Ecology
                Biodiversity
                Ecology and Environmental Sciences
                Ecology
                Biodiversity
                Computer and Information Sciences
                Data Management
                Ontologies
                Research and Analysis Methods
                Database and Informatics Methods
                Biological Databases
                Sequence Databases
                Research and Analysis Methods
                Database and Informatics Methods
                Bioinformatics
                Sequence Analysis
                Sequence Databases
                Social Sciences
                Linguistics
                Phonology
                Syntax
                Biology and Life Sciences
                Anatomy
                Digestive System
                Gastrointestinal Tract
                Colon
                Medicine and Health Sciences
                Anatomy
                Digestive System
                Gastrointestinal Tract
                Colon
                Biology and Life Sciences
                Genetics
                Genomics
                Animal Genomics
                Mammalian Genomics

                Life sciences
                Life sciences

                Comments

                Comment on this article