43
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      FAIR Principles: Interpretations and Implementation Considerations

      1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 2 , 9 , 3 , 10 , 11 , 12 , 13 , 14 , 14 , 15 , 16 , 17 , 1 , 6 , 18 , 19 , 14 , 20 , 21 , 4 , 22 , 23 , 24 , 4 , 25 , 4 , 26 , 27 , 28 , 1 , 29 , 30 , 31 , 9 , 32 , 1 , 1 , 26 , 26 , 33
      Data Intelligence
      MIT Press - Journals

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The FAIR principles have been widely cited, endorsed and adopted by a broad range of stakeholders since their publication in 2016. By intention, the 15 FAIR guiding principles do not dictate specific technological implementations, but provide guidance for improving Findability, Accessibility, Interoperability and Reusability of digital resources. This has likely contributed to the broad adoption of the FAIR principles, because individual stakeholder communities can implement their own FAIR solutions. However, it has also resulted in inconsistent interpretations that carry the risk of leading to incompatible implementations. Thus, while the FAIR principles are formulated on a high level and may be interpreted and implemented in different ways, for true interoperability we need to support convergence in implementation choices that are widely accessible and (re)-usable. We introduce the concept of FAIR implementation considerations to assist accelerated global participation and convergence towards accessible, robust, widespread and consistent FAIR implementations. Any self-identified stakeholder community may either choose to reuse solutions from existing implementations, or when they spot a gap, accept the challenge to create the needed solution, which, ideally, can be used again by other communities in the future. Here, we provide interpretations and implementation considerations (choices and challenges) for each FAIR principle.

          Related collections

          Most cited references15

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          FAIRsharing as a community approach to standards, repositories and policies

          To the Editor — Community-developed standards, such as those for the identification 1 , citation 2 and reporting 3 of data, underpin reproducible and reusable research, aid scholarly publishing, and drive both the discovery and the evolution of scientific practice. The number of these standardization efforts, driven by large organizations or at the grassroots level, has been on the rise since the early 2000s. Thousands of community-developed standards are available (across all disciplines), many of which have been created and/or implemented by several thousand data repositories. Nevertheless, their uptake by the research community has been slow and uneven mainly because investigators lack incentives to follow and adopt standards. Uptake is further compromised if standards are not promptly implemented by databases, repositories and other research tools, or endorsed by infrastructures. Furthermore, the fragmentation of community efforts results in the development of arbitrarily different, incompatible standards. In turn, this leads to standards becoming rapidly obsolete in fast-evolving research areas. As with any other digital object, standards, databases and repositories are dynamic in nature, with a ‘life cycle’ that encompasses formulation, development and maintenance; their status in this cycle may vary depending on the level of activity of the developing group or community. There is an urgent need for a service that enhances the information available on the evolving constellation of heterogeneous standards, databases and repositories; guides users in the selection of these resources; and works with developers and maintainers of these resources to foster collaboration and promote harmonization. Such a service is vital to reduce the knowledge gap among those involved in producing, managing, serving, curating, preserving, publishing or regulating data. A diverse set of stakeholders, representing academia, industry, funding agencies, standards organizations, infrastructure providers and scholarly publishers—both national and domain-specific as well as global and general organizations—have come together as a community, representing the core adopters, advisory board members, and/or key collaborators of the FAIRsharing resource (https://fairsharing.org/communities). Here we introduce its mission and community network. We evaluate the standards landscape, focusing on those for reporting data and metadata and their implementation by databases and repositories. We report on the ongoing challenge to recommend resources and the importance of making standards invisible to the end users. Finally, we highlight the role each stakeholder group must play to maximize the visibility and adoption of standards, databases and repositories. Mapping the landscape and tracking evolution Working with and for data producers and consumers, and taking advantage of our large network of international collaborators, we have iteratively 3–5 developed FAIRsharing (https://fairsharing.org), an informative and educational resource that describes and interlinks community-driven standards, databases, repositories and data policies. As of February 2019, FAIRsharing has over 2,620 records: 1,293 standards, 1,209 databases and 118 data policies (of which 82 are from journals and publishers and 23 from funders), covering natural sciences (for example, biomedical, chemistry, astronomy, agriculture, earth sciences and life sciences), engineering, and humanities and social sciences. Using community participation, the FAIRsharing team precisely curates information on standards employed for the identification, citation and reporting of data and metadata, via four standards subtypes. First, minimum reporting guidelines—also known as guiding principles or checklists—outline the necessary and sufficient information vital for contextualizing and understanding a digital object. Second, terminology artifacts or ‘semantics’, ranging from dictionaries to ontologies, provide definitions and unambiguous identification for concepts and objects. Third, models and formats define the structure and relationship of information for a conceptual model and include transmission formats to facilitate the exchange of data between different systems. And lastly, identifier schemata are formal systems for resources and other digital objects that allow their unique and unambiguous identification. FAIRsharing monitors the evolution of these standards, their implementation in databases and repositories, and recommendation by journal and funder data policies. Producers of standards, databases and repositories are able to claim the records for the resources they maintain or have developed; this functionality allows them to gain personal recognition and ensures that the description is accurate and up-to-date. All records and related updates by the maintainers are checked by a FAIRsharing curator. Conversely, if a record is updated by a FAIRsharing curator, an e-mail notification is sent to the record claimant, minimizing the introduction of inaccuracies. In communication with the community behind each resource, FAIRsharing assigns indicators to show the status in the resource’s life cycle: ‘Ready’ for use, ‘In Development’, ‘Uncertain’ (when any attempt to reach out to the developing community has failed), and ‘Deprecated’ (when the community no longer mandates its use, together with an explanation where available). To make standards, databases, repositories and data policies more discoverable and citable, we mint digital object identifiers (DOIs) for each record, which provides a persistent and unique identifier to enable referencing of these resources. In addition, the maintainers of each record can be linked with their Open Research and Contributor IDentifier (ORCID) profile (https://orcid.org). Citing a FAIRsharing record for a standard, database and repository offers an at-a-glance view of all descriptors and indicators pertaining to a resource, as well as any evidence of adoption or endorsement by a data policy or organization. Referencing the record together with the resource’s main paper (which provides a snapshot of its status at a given time) provides a complete reference for a resource. FAIRsharing has its own record to serve this very purpose: 10.25504/FAIRsharing.2abjs5. FAIRsharing collects the necessary information to ensure that standards, databases, repositories and data policies align with the FAIR data principles 6 : Findable (for example, by providing persistent and unique identifiers, and functionalities to register, claim, maintain, interlink, search and discover them), Accessible (for example, identifying their level of openness and/or license type), Interoperable as much as possible (for example, highlighting which repositories implement the same standards to structure and exchange data) and Reusable (for example, knowing the coverage of a standard and its level of endorsement by a number of repositories should encourage its use or extension in neighboring domains, rather than reinvention). FAIRsharing collaborates with many other infrastructure resources to cross-link each record to other registries, as well as within major FAIR-driven global initiatives, research and infrastructure programs, many of which are generic and cross-disciplinary. A ‘live’, updated list is maintained at https://fairsharing.org/communities, with the roles that FAIRsharing plays. An example is the FAIR Metrics working group (http://fairmetrics.org) 7 , where we work to guide producers of standards, databases and repositories to assess the level of FAIRness of their resource. We will develop measurable indicators of maturity, which will be progressively implemented in the FAIRsharing registry. The content within FAIRsharing is licensed via the Creative Commons Attribution ShareAlike 4.0 license (CC BY-SA 4.0); the ShareAlike clause enhances the open heritage and aims to create a larger open commons, ensuring that downstream users share back. We say we need standards, but do we use them? The scientific community, funders and publishers all endorse the concept that common data and metadata standards underpin data reproducibility, ensuring that the relevant elements of a dataset are reported and shared consistently and meaningfully. However, navigating through the many standards available can be discouraging and often unappealing for prospective users. Bound within a particular discipline or domain, reporting standards are fragmented, with gaps and duplications, thereby limiting their combined used. Although standards should stand alone, they should also function well together, especially to better support not only multidimensional data but also the aggregation of pre-existing datasets from one or more disciplines or domains. Understanding how they work or how to comply with them takes time and effort. Measuring the uptake of standards, however, is not trivial, and achieving a full picture is practically impossible. FAIRsharing provides a snapshot of the standards landscape, which is dynamic and will continue to evolve as we engage with more communities and verify the information we house, add new resources, track their life-cycle status and usage in databases and repositories, and link out to examples of training material. FAIRsharing also plays a fundamental role in the activation of the decision-making chain, which is an essential step toward fostering the wider adoption of standards. When a standard is mature and appropriate standard-compliant systems become available, such as databases and repositories, these must then be channeled to the relevant stakeholder community, who in turn must recommend them (for example, in data policies)—and ultimately may require them—or use them (for example, to define a data management plan) to facilitate a high-quality research cycle. As of February 2019, 166 of FAIRsharing’s 1,293 community standards are generic and multidisciplinary and the rest are discipline specific (encompassing life, agricultural, health, biomedical, environmental, humanities and engineering sciences).133 reporting guidelines (out of 154), 641 terminology artifacts (out of 728), 357 models/formats (out of 387), and 10 identifier schemata (out of 11) are mature and tagged as ‘Ready’ for use. Table 1 displays the top ten most-accessed data and metadata standard records in FAIRsharing during 2018. This ranking most likely reflects the popularity of a standard rather than directly correlating with the level of standard adoption (by journal and funder data policies, or by databases and repositories). The ranking is also very variable and can change substantially from year to year, which may reflect the differing levels of activity focused on standard development in a particular research community over time. Table 1 As of February 2019, the 12 data and metadata standards in the top ten positions (all tagged as ‘Ready’) ranked according to the page views in 2018 and subsequently ordered by the number of journals or publishers recommending them Rank Name Type Page views in 2018 Number of journals’ and publishers’ policies recommending it Number of databases and repositories implementing it 1 Clinical Data Interchange Standards Consortium (CDISC) Analysis Data Model (ADaM) 10.25504/FAIRsharing.dvxkzb Model/format 287 0 0 2 Minimum Information about any (x) Sequence (MIxS) 10.25504/FAIRsharing.9aa0zp Reporting guideline 284 3 8 3 Minimum Information About a Microarray Experiment (MIAME) 10.25504/FAIRsharing.32b10v Reporting guideline 247 2 11 4 Minimum Information about a high-throughput nucleotide SEQuencing Experiment (MINSEQE) 10.25504/FAIRsharing.a55z32 Reporting guideline 246 1 4 5 The FAIR Principles (FAIR) https://fairsharing.org/FAIRsharing.WWI10U Reporting guideline 214 0a 2a 6 Minimum Information about a Flow Cytometry Experiment (MIFlowCyt) 10.25504/FAIRsharing.kcnjj2 Reporting guideline 170 0 2 7 Schema.org https://fairsharing.org/FAIRsharing.hzdzq8 Model/format 163 0 29 8 Gene Ontology (GO) 10.25504/FAIRsharing.6xq0ee Terminology artifact 149 0 159 9 Core Attributes of Biological Databases (BioDBCore) 10.25504/FAIRsharing.qhn29e Reporting guideline 0 2 10 DataCite Metadata Schema 10.25504/FAIRsharing.me4qwe Model/format 145 7 14 aAlthough almost universally accepted, the use of the FAIR principles is implicit. FAIRsharing is working with both policy makers and repositories to raise awareness of the FAIR principles and we therefore expect these numbers to rise in the coming years. Table 2 displays the top ten data and metadata standard records that have been implemented by databases and repositories, providing a realistic measure of the use of data and metadata standards to annotate, structure and share datasets. Surprisingly, with the exception of one (the US National Center for Biotechnology Information (NCBI) Taxonomy, a terminology artifact for taxonomic information: 10.25504/FAIRsharing.fj07xj), none of the other nine standards is explicitly recommended in journals and databases’ data policies, including the standard most implemented by databases and repositories (the FASTA Sequence Format, a model/format for representing either nucleotide sequences or peptide sequences: 10.25504/FAIRsharing.rz4vfg). This omission can probably be explained by the fact that, created in 1985, this is a de facto standard that every sequence database and repository implements by default, thus becoming (positively) ‘invisible’ to users, including publishers and journals. Table 2 As of February 2019, the top ten data and metadata standards (all tagged ‘Ready’) ranked according to the number of implementations by databases and repositories Rank Name Type Number of databases and repositories implementing it Number of journals’ and publishers’ policies recommending it Page views in 2018 1 FASTA Sequence Format 10.25504/FAIRsharing.rz4vfg Model/format 253 0 149 2 Gene Ontology (GO) 10.25504/FAIRsharing.6xq0ee Terminology artifact 159 0 149 3 Protein Data Bank (PDB) Format 10.25504/FAIRsharing.9y4cqw Model/format 59 0 10 4 Generic Feature Format Version 3 (GFF3) 10.25504/FAIRsharing.dnk0f6 Model/format 48 0 7 5 Chemical Entities of Biological Interest (ChEBI) 10.25504/FAIRsharing.62qk8w Terminology artifact 35 0 35 6 NCBI Taxonomy (NCBITAXON) 10.25504/FAIRsharing.fj07xj Terminology artifact 32 3 104 7 GenBank Sequence Format 10.25504/FAIRsharing.rg2vmt Model/format 29 0 39 8 Schema.org 10.25504/FAIRsharing.hzdzq8 Model/format 29 0 119 9 Sequence Ontology (SO) 10.25504/FAIRsharing.6bc7h9 Terminology artifact 28 0 15 10 Molecular Interaction Tabular (MITAB) 10.25504/FAIRsharing.ve0710 Model/format 18 0 13 To understand how journals and publishers select which resource to recommend (https://fairsharing.org/recommendations), we have worked closely with the editors from the following eight journals or publishers: EMBO Press, F1000Research, Oxford University Press’s GigaScience, PLOS, Elsevier and Springer Nature’s BioMed Central and Scientific Data. As shown in Table 3 (https://fairsharing.org/article/live_list_standards_in_policies), as of February 2019, the 13 data policies of these journals or publishers recommend a total of 33 standards: 18 reporting guidelines, 8 terminology artifacts and 7 models/formats. Surprisingly, out of these 33, only 1 (the NCBI Taxonomy) is in the top ten standards most implemented by databases and repositories (as shown in Table 1), whereas one-third (10 reporting guidelines and 1 terminology artifact) are not even implemented. Furthermore, these data policies recommend 187 (generalist and domain-specific) databases and repositories. The 26 that occupy the top five positions are shown in Table 4 (https://fairsharing.org/article/live_list_databases_in_policies). As expected, this top tier includes public databases and repositories from major research and infrastructure providers from the United States and Europe; the domain-specific UniProt Knowledgebase (10.25504/FAIRsharing.s1ne3g) is at the top of the list with the higher number of standards implemented. However, this analysis also indicates that an additional 185 standards that are implemented by the recommended databases and repositories are not explicitly mentioned at all in these 13 journals’ or publishers’ data policies. Table 3 As of February 2019, the 33 reporting guidelines in the top five positions (all tagged ‘Ready’) ranked according to the number of recommendations by 13 journals’ or publishers’ data policies (see main text) and subsequently ordered by the number of databases and repositories that implement them Rank Name Type Number of journals’ and publishers’ policies recommending it Number of databases and repositories implementing it Page views in 2018 1 FORCE11 Data Citation Principles (FORCE11 DC) 10.25504/FAIRsharing.9hynwc Reporting guideline 9 3 27 Animals in Research: Reporting In Vivo Experiments (ARRIVE) 10.25504/FAIRsharing.t58zhj Reporting guideline 9 0 60 CONSOlidated standards of Reporting Trials (CONSORT) 10.25504/FAIRsharing.gr06tm Reporting guideline 9 0 36 Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) 10.25504/FAIRsharing.gp3r4n Reporting guideline 9 0 39 Case Reports (CARE) 10.25504/FAIRsharing.zgqy0v Reporting guideline 9 0 17 2 DataCite Metadata Schema 10.25504/FAIRsharing.me4qwe Model/format 7 14 85 3 NCBI Taxonomy (NCBITAXON) 10.25504/FAIRsharing.fj07xj Terminology artifact 3 32 104 Investigation Study Assay Tabular (ISA-Tab) 10.25504/FAIRsharing.53gp75 Model/format 3 11 67 Minimum Information about any (x) Sequence (MIxS) 10.25504/FAIRsharing.9aa0zp Reporting guideline 3 8 239 4 Minimum Information About a Microarray Experiment (MIAME) 10.25504/FAIRsharing.32b10v Reporting guideline 2 11 162 Minimum Information About a Proteomics Experiment (MIAPE) 10.25504/FAIRsharing.8vv5fc Reporting guideline 2 4 62 Minimum Information about a Molecular Interaction Experiment (MIMIx) 10.25504/FAIRsharing.8z3xzh Reporting guideline 2 4 46 MIAME Notation in Markup Language (MINiML) 10.25504/FAIRsharing.gaegy8 Model/format 2 2 32 Consolidated criteria for reporting qualitative research (COREQ) 10.25504/FAIRsharing.6mhzhj Reporting guideline 2 0 11 STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) 10.25504/FAIRsharing.1mk4v9 Reporting guideline 2 0 22 STAndards for the Reporting of Diagnostic accuracy (STARD) 10.25504/FAIRsharing.956df7 Reporting guideline 2 0 15 Consolidated Health Economic Evaluation Reporting Standards (CHEERS) 10.25504/FAIRsharing.neny94 Reporting guideline 2 0 10 CONSOlidated Standards of Reporting Trials – Official Extensions (CONSORT-OE) 10.25504/FAIRsharing.wstthd Reporting guideline 2 0 6 CONsolidated Standards of Reporting Trials – Unofficial Extensions (CONSORT-UE) 10.25504/FAIRsharing.2kq1fs Reporting guideline 2 0 5 5 Systems Biology Markup Language (SBML) 10.25504/FAIRsharing.9qv71f Model/format 1 15 51 Ontology for Biomedical Investigations (OBI) 10.25504/FAIRsharing.284e1z Terminology artifact 1 11 71 PSI Molecular Interaction Controlled Vocabulary (PSI-MI CV) 10.25504/FAIRsharing.8qzmtr Terminology artifact 1 9 7 Experimental Factor Ontology (EFO) 10.25504/FAIRsharing.1gr4tz Terminology artifact 1 8 21 mz Markup Language (mzML) 10.25504/FAIRsharing.26dmba Model/format 1 7 30 Minimal Information Required In the Annotation of Models (MIRIAM) 10.25504/FAIRsharing.ap169a Reporting guideline 1 5 18 Environment Ontology (EnvO) 10.25504/FAIRsharing.azqskx Terminology artifact 1 5 23 Minimal Information about a high throughput SEQuencing Experiment (MINSEQE) 10.25504/FAIRsharing.a55z32 Reporting guideline 1 4 129 CellML 10.25504/FAIRsharing.50n9hc Model/format 1 3 25 BioAssay Ontology (BAO) 10.25504/FAIRsharing.mye76w Terminology artifact 1 1 6 eagle-i Research Resource Ontology (ERO) 10.25504/FAIRsharing.nwgynk Terminology artifact 1 1 5 ThermoML 10.25504/FAIRsharing.7b0fc3 Model/format 1 1 12 Units Ontology (UO) 10.25504/FAIRsharing.mjnypw Terminology artifact 1 1 16 Table 4 As of February 2019, the 26 databases and repositories in the top five positions (all tagged ‘Ready’) ranked according to the number of recommendations by 13 journals’ or publishers’ data policies (see main text) and subsequently ordered by the number of standards implemented Rank Name Number of journals’ and publishers’ policies recommending it Number of standards implemented Page views in 2018 1 UniProt Knowledgebase (UniProtKB) 10.25504/FAIRsharing.s1ne3g 13 16 116 European Nucleotide Archive (ENA) 10.25504/FAIRsharing.dj8nt8 13 9 165 ArrayExpress 10.25504/FAIRsharing.6k0kwd 13 7 173 GenBank 10.25504/FAIRsharing.9kahy4 13 9 386 FAIRsharing 10.25504/FAIRsharing.2abjs5 13 6 122 Gene Expression Omnibus (GEO) 10.25504/FAIRsharing.5hc8vt 13 4 106 2 PRoteomics IDEntifications database (PRIDE) 10.25504/FAIRsharing.e1byny 12 14 73 MetaboLights (MTBLS) 10.25504/FAIRsharing.kkdpxe 12 8 197 PANGAEA – Data Publisher for Earth and Environmental Science 10.25504/FAIRsharing.6yw6cp 12 7 22 3 MGnify – EBI Metagenomics 10.25504/FAIRsharing.dxj07r 11 5 70 Sequence Read Archive (SRA) 10.25504/FAIRsharing.g7t2hv 11 4 135 figshare 10.25504/FAIRsharing.drtwnh 11 2 415 Open Science Framework (OSF) 10.25504/FAIRsharing.g4z879 11 0 489 OpenNeuro 10.25504/FAIRsharing.s1r9bw 11 1 85 Database of Genomic Variants Archive (DGVA) 10.25504/FAIRsharing.txkh36 11 0 47 European Variation Archive (EVA) 10.25504/FAIRsharing.6824pv 11 2 29 Coherent X-ray Imaging Data Bank (CXIDB) 10.25504/FAIRsharing.y6w78m 11 2 30 4 The European Genome-phenome Archive (EGA) 10.25504/FAIRsharing.mya1ff 10 6 68 The Cancer Imaging Archive (TCIA) 10.25504/FAIRsharing.jrfd8y 10 1 102 5 NCBI BioSample 10.25504/FAIRsharing.qr6pqk 9 3 12 RCSB Protein Data Bank (RCSB PDB) 10.25504/FAIRsharing.2t35ja 9 2 29 Crystallography Open Database (COD) 10.25504/FAIRsharing.7mm5g5 9 0 63 NeuroVault 10.25504/FAIRsharing.rm14bx 9 1 9 National Addiction & HIV Data Archive Program (NAHDAP) 10.25504/FAIRsharing.k34tv5 9 0 116 NCBI Trace Archives 10.25504/FAIRsharing.abwvhp 9 0 31 HUGO Gene Nomenclature Committee (HGNC) 10.25504/FAIRsharing.29we0s 9 0 10 If one looks instead at all 82 journals’ or publishers’ data policies curated in FAIRsharing (instead of just 13), one sees the same discrepancy. As of February 2019, only 66 data policies mention one or more specific standards (https://fairsharing.org/article/live_list_journal_policies); the minimal reporting guidelines are recommended 17 times as often as terminology artifacts and 12 times as often as models/formats (and model formats are heavily implemented by data repositories); databases are recommended 702 times, with 187 databases recommended in total, 44 times as often as models/formats. Based on ongoing activity with the eight journals and publishers mentioned above, along with other interested parties such as eLife, Taylor & Francis Group, Wiley and Hindawi (https://fairsharing.org/communities), we understand this discrepancy in recommendation to be the consequence of a cautious approach to choosing which standard to recommend where thousands of (often competing) standards are available. It is understandable if journals or publishers do not overreach. Recommendation of a standard is often driven by the editor’s familiarity with one or more standards, notably for journals or publishers focusing on specific disciplines and areas of study, or the engagement with learned societies and researchers actively supporting and using certain standards. As a rule, beyond individuals involved in standards developments, the rest of a research community that journals or publishers serve is often not familiar with standards; indeed, many researchers often perceive standards as a hindrance to data reporting rather than a help. Therefore, the current trend is for journals or publishers to recommend generalist repositories and a core set of discipline-specific repositories, even though a bigger number of (public and global, project-driven, and institution-based) databases and repositories exist. Similarly, journals and publishers tend to recommend very few standards, and those they do are usually data citation standards or minimum reporting guidelines (the metadata standards more relevant to publication). The general opinion of these editors is that terminology artifacts and models/formats instead should emerge from a close collaboration between their developing community and the implementing repositories, and they should remain only implicitly suggested. FAIRsharing, therefore, is positioned to highlight to journals or publishers, as well as researchers and other stakeholders, which terminology artifacts and models/formats, along with other standards, each database and repository implements. This, along with community indicators of use and maturity, as well as emerging global certifications, is essential to inform the selection or recommendation of relevant databases and repositories. FAIRsharing aims to increase the visibility, citation and credit of these community-driven standards, databases and repository efforts. The best standards are invisible and transparent Standards for reporting of data and metadata are essential for data reuse, which drives scientific discovery and reproducibility. Minimal reporting guidelines are intended for human consumption and are usually narrative in form and therefore prone to ambiguities, making compliance and validation difficult and approximate. Many of these guidelines, however, already come with (or lead to the development of) associated models/formats and terminology artifacts, which are created to be machine readable (rather than for human consumption). These two types of standards ensure the datasets are harmonized in regard to structure, formatting and annotation, setting the foundation for the development of tools and repositories that enable transparent interpretation, verification, exchange, integrative analysis and comparison of (heterogeneous) data. The goal is to ensure the implementation of these standards in data annotation tools and data repositories, making these standards invisible to the end users. Models/formats and terminology artifacts are essential to the implementation of the FAIR principles that emphasize enhancing the ability of machines to automatically discover and use data and metadata. In particular, the ‘computability’ of standards is core to the development of FAIR metrics to measure the level of compliance of a given dataset against the relevant metadata descriptors. These machine-readable standards provide the necessary quantitative and verifiable measures of the degree to which data meet these reporting guidelines. The latter, on their own, would just be statements of unverifiable good intentions of compliance to given standards. Delivering tools and practices to create standards-based templates for describing datasets smarter and faster is essential, if we are to use these standards in the authoring of metadata for the variety of data types in the life sciences and other disciplines. FAIRsharing is already involved in ongoing community discussions around the need for common frameworks for disciplinary research data management protocols 8 . Furthermore, research activities to deliver machine-readable standards are already underway by the FAIRsharing team and collaborators 9 ; all outputs will be freely shared for others to develop tools that would make it easy to check the compliance of data to standards. Committed to community service The FAIRsharing mission is to increase guidance to consumers of standards, databases, repositories, and data policies, to accelerate the discovery, selection and use of these resources; and increase producer satisfaction in terms of resource visibility, reuse, adoption and citation. Box 1 illustrates community-provided exemplar use cases that drive our work. This is a major undertaking, but it is a journey we are not making alone. Collaborative work is happening on many fronts. We are categorizing the records according to discipline and domain via two open application ontologies. This should facilitate more accurate browsing, discovery and selection. To improve our policy registry, we are disambiguating policies from individual journals and those from publishers that encompass groups of journals. This will increase the number of journals covered and more accurately represent the different data policy models being pursued by publishers. Selection and decision-making are being improved by the enrichment of indicators based on community-endorsed and discipline-specific criteria, such as FAIR metrics and FAIRness level. To maximize the ‘look-up service’ functionality and to connect the content to other registries and tools, we are creating customizable interfaces for human as well as programmatic access to the data. We are also expanding the existing network graph and creating new visually accessible statistics (https://fairsharing.org/summary-statistics). Finally, on a monthly basis, we are highlighting featured exemplar resources, as well as adding to the informational and educational material available on FAIRsharing. Box 1 How FAIRsharing can help different stakeholders FAIRsharing offers benefits to several different stakeholders in the research endeavor. For example: Casey (a researcher) searches FAIRsharing to identify an established repository, recognized by the journal she plans to submit to, with restricted data access to deposit her sensitive datasets, as recommended by her funder’s data policy. Andrea (a biocurator) searches FAIRsharing for suitable standards to describe a set of experiments. He filters the results by disciplines, focusing on standards implemented by one or more data repositories, with available annotations tools. He also looks for examples of the most up-to-date version of the standards and the details of a person or support group to contact. Alex (a standards developer) creates and maintains a personalized collection page on FAIRsharing to list and showcase the set of standards developed by the grassroots standard organization she is the representative of. Alex registers the standards and/or claims existing records added by the FAIRsharing team, vetting the descriptions and/or enhancing them by adding indicators of maturity for the standards and indicating the repositories and tools implementing them. Alex’s grassroots organization uses the collection to maximize the visibility of their standards, promoting adoption outside their immediate community, also favoring reuse in and extensions to other areas. Sam (a repository manager) registers the data resource at FAIRsharing manually or programmatically, describing terms of deposition and access, adding information on the resource’s relationship to other repositories and use of standards, and assessing the level FAIRness of his data repository. He links the record to funding source(s) supporting the resources and the institute(s) hosting it, as well as his ORCID profile to get credit for his role as maintainer of a resource. Sam receives alerts if a publisher recommends the repository in a data policy, and uses the DOI assigned to the repository record to cite the evidence of adoption. Marion (a policymaker) registers a journal’s data policy in FAIRsharing, creating and maintaining an interrelated list of the repositories and standards recommended to the authors, to deposit and annotated data and other digital assets. Marion keeps the data policy up to date using visualization and comparison functionalities, and consulting the knowledge graph that offers an interactive view of the repositories, tools and standards, as well as receiving customized alerts (for example, when a repository has changed its data access terms or when a standard has been superseded by another). Lesley (a data manager) consults FAIRsharing when creating a data management plan to identify the most appropriate reporting guidelines, formats and terminologies for data types, and formally cites these community standards using their DOIs and/or the ‘how to cite this record’ statements provided for each resource. Robin (a librarian) supports research data use in FAIRsharing by enriching educational and training material to support scholars in the use of data standards, in their ability to conform to journal and funder policies, and in developing and providing guidance that increases researchers’ capability and skills, empowering them to organize and make their data FAIR. Guidance to stakeholders To foster a culture change within the research community into one where the use of standards, databases and repositories for FAIRer data is pervasive and seamless, we need to better promote the existence and value of these resources. First and foremost, we need to paint an accurate picture of the status quo. Several stakeholders can play catalytic roles (Fig. 1). Fig. 1 FAIRsharing guidance to each stakeholder group. Image by FAIRsharing.org, used under a Creative Commons BY-SA 4.0 license. Standards developers and database curators can use FAIRsharing to explore what resources exist in their areas of interest (and whether those resources can be used or extended), as well as enhance the discoverability and exposure of their resource. This resource might then receive credit outside of their immediate community and ultimately promote adoption. (To learn how to add your resource to FAIRsharing or to claim it, see https://fairsharing.org/new.) A representative of a community standardization initiative is best placed to describe the status of a standard and to track its evolution. This can be done by creating an individual record (for example, the Data Documentation Initiative (DDI) standard for social, behavioral, economic, and health data; 10.25504/FAIRsharing.1t5ws6) or by grouping several records together in a collection (for example, the Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI) standards for proteomics and interactomics data; https://fairsharing.org/collection/HUPOPSI). To achieve FAIR data, linked data models need to be provided that allow the publishing and connecting of structured data on the web. Similarly, representatives of a database or repository are uniquely placed to describe their resource and to declare the standards implemented (for example, the Inter-university Consortium for Political and Social Research (ICPSR) archive, which uses the DDI standard (10.25504/FAIRsharing.y0df7m); or the Reactome Knowledge Base (10.25504/FAIRsharing.tf6kj8), which uses several standards in the COmputational Modeling in BIology NEtwork (COMBINE) collection, https://fairsharing.org/collection/ComputationalModelingCOMBINE). The more adopted a resource is, the greater its visibility. For example, if your standard is implemented by a repository, these two records will be interlinked; thus, if someone is interested in that repository they will see that your standard is used by that resource. If your resource is recommended in a data policy from a journal, funder or other organization, it will be given a ‘recommended’ ribbon, which is present on the record itself and clearly visible when the resource appears in search results. For journal publishers or organizations with a data policy, FAIRsharing enables the maintenance of an interrelated list of citable standards and databases, grouping those that the policy recommends to users or their community (for example, see examples of recommendations created by eight main publishers and journals; https://fairsharing.org/recommendations). As FAIRsharing continues to map the landscape, journals and publishers can also revise their selections over time, enabling the recommendation of additional resources with more confidence. All journals that do not have such data statements should develop them to ensure all data relating to an article or project are as FAIR as possible. Finally, journal editors should also encourage authors to cite the standards, database and repositories they use or develop via the ‘how to cite this record’ statement, found on each FAIRsharing record, which includes a DOI. Trainers, educators, librarians and those organizations and services involved in supporting research data can use FAIRsharing to provide a foundation on which to create or enrich educational lectures, training and teaching material, and to plug it into data management planning tools. These stakeholder communities play a pivotal role to prepare the new generation of scientists and deliver courses and tools that address the need to guide or empower researchers to organize data and to make it FAIR. Learned societies, international scientific unions and associations, and alliances of these organizations should raise awareness around standards, databases, repositories and data policies—in particular, on their availability, scope and value for FAIR and reproducible research. FAIRsharing works with many organizations that have already mobilized their community members to take action (for example, see refs. 10–12 ), to promote the use and adoption of key resources, and to initiate new or participate in existing initiatives to define and implement policies and projects. Funders can use FAIRsharing to help select the appropriate resources to recommend in their data policy and highlight those resources that awardees should consider when writing their data management plan (for example, see ref. 13 ). Funders should recognize standards, as well as databases and repositories, as digital objects in their own right, which have and must have their own associated research, development and educational activities 14 . FAIRsharing has already been identified as a key resource and service that helps in turning FAIR data a reality 15 . New funding frameworks need to be created to provide catalytic support for the technical and social activities around standards, in specific domains, within and across disciplines to enhance their implementation in databases and repositories, and the interoperability and reusability of data. Last but not least, researchers can use FAIRsharing as a lookup resource to identify and cite the standards, databases or repositories that exist for their data and discipline—for example, when creating a data management plan for a grant proposal or funded project, or when submitting a manuscript to a journal, to identify the recommended databases and repositories, as well as the standards they implement to ensure all relevant information about the data is collected at the source. Today’s data-driven science, as well as the growing demand from governments, funders and publishers for FAIRer data, requires greater researcher responsibility. Acknowledging that the ecosystem of guidance and tools is still work in progress, it is essential that researchers develop or enhance their research data management skills, or seek the support of professionals in this area. FAIRsharing brings the producers and consumers of standards, databases, repositories and data policies closer together, with a growing list of adopters (https://fairsharing.org/communities). Representatives of institutions, libraries, journal publishers, funders, infrastructure programs, societies and other organizations or projects (that in turn serve and guide individual researchers or other stakeholders on research data management matters) can become adopters. We welcome collaborative proposals and are open to participate in joint projects to develop services for specific stakeholders and communities. Join us or reach out to us, and let’s pave the way for FAIRer data together. Supplementary Information Supplementary Information Supplementary Note
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Cloudy, increasingly FAIR; revisiting the FAIR Data guiding principles for the European Open Science Cloud

            The FAIR Data Principles propose that all scholarly output should be Findable, Accessible, Interoperable, and Reusable. As a set of guiding principles, expressing only the kinds of behaviours that researchers should expect from contemporary data resources, how the FAIR principles should manifest in reality was largely open to interpretation. As support for the Principles has spread, so has the breadth of these interpretations. In observing this creeping spread of interpretation, several of the original authors felt it was now appropriate to revisit the Principles, to clarify both what FAIRness is, and is not.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              A design framework and exemplar metrics for FAIRness

              The FAIR Principles 1 (https://doi.org/10.25504/FAIRsharing.WWI10U) provide guidelines for the publication of digital resources such as datasets, code, workflows, and research objects, in a manner that makes them Findable, Accessible, Interoperable, and Reusable (FAIR). The Principles have rapidly been adopted by publishers, funders, and pan-disciplinary infrastructure programmes and societies. The Principles are aspirational, in that they do not strictly define how to achieve a state of "FAIRness", but rather they describe a continuum of features, attributes, and behaviors that will move a digital resource closer to that goal. This ambiguity has led to a wide range of interpretations of FAIRness, with some resources even claiming to already "be FAIR"! The increasing number of such statements, the emergence of subjective and self-assessments of FAIRness 2,3 , and the need of data and service providers, journals, funding agencies, and regulatory bodies to qualitatively or quantitatively evaluate such claims, led us to self-assemble and establish a FAIR Metrics group (http://fairmetrics.org) to pursue the goal of defining ways to measure FAIRness. As co-authors of the FAIR Principles and its associated manuscript, founding this small focus group was a natural and timely step for us, and we foresee group membership expanding and broadening according to the needs and enthusiasm of the various stakeholder communities. Nevertheless, in this first phase of group activities we did not work in isolation, but we gathered use cases and requirements from the communities, organizations and projects we are core members of, and where discussions on how to measure FAIRness have also started. Our community network and formal participation encompasses generic and discipline-specific initiatives, including: the Global and Open FAIR (http://go-fair.org), the European Open Science Cloud (EOSC; https://eoscpilot.eu), working groups of the Research Data Alliance (RDA; https://www.rd-alliance.org) and Force11 (https://www.force11.org), the Data Seal of Approval 4 , Nodes of the European ELIXIR infrastructure (https://www.elixir-europe.org), projects under the USA National Institutes of Health (NIH)’s Big Data to Knowledge Initiative (BD2K) and its new Data Commons Pilots (https://commonfund.nih.gov/bd2k/commons). In addition, via the FAIRsharing network and advisory board (https://fairsharing.org), we are also connected to open standards-developing communities and data policy leaders, and also editors and publishers, especially those very active around data matters, such as: Springer Nature’s Scientific Data, Nature Genetics and BioMedCentral, PloS Biology, The BMJ, Oxford University Press’s GigaScience, F1000Research, Wellcome Open Research, Elsevier, EMBO Press and Ubiquity Press. The converging viewpoints on FAIR metrics and FAIRness, arising from our information-gathering discussions with these various communities and stakeholders groups, can be summarized as it follows: Metrics should address the multi-dimensionality of the FAIR principles, and encompass all types of digital objects. Universal metrics may be complemented by additional resource-specific metrics that reflect the expectations of particular communities. The metrics themselves, and any results stemming from their application, must be FAIR. Open standards around the metrics should foster a vibrant ecosystem of FAIRness assessment tools. Various approaches to FAIR assessment should be enabled (e.g. self-assessment, task forces, crowd-sourcing, automated), however, the ability to scale FAIRness assessments to billions if not trillions of diverse digital objects is critical. FAIRness assessments should be kept up to date, and all assessments should be versioned, have a time stamp, and be publicly accessible. FAIRness assessments presented as a simple visualization, will be a powerful modality to inform users and guide the work of producers of digital resources. The assessment process, and the resulting FAIRness assessment, should be designed and disseminated in a manner that positively incentivizes the providers of digital resources; i.e., they should view the process as being fair and unbiased, and moreover, should benefit from these assessments and use them as an opportunity to identify areas of improvement. Governance over the metrics, and the mechanisms for assessing them, will be required to enable their careful evolution and address valid disagreements. Here we report on the framework we have developed, which encompasses the first iteration of a core set of FAIRness indicators that can be objectively measured by a semi-automated process, and a template that can be followed within individual scholarly domains to derive community-specific metrics evaluating FAIR aspects important to them. From the outset, the group decided that it would focus on FAIRness for machines – i.e., the degree to which a digital resource is findable, accessible, interoperable, and reusable without human intervention. This was because FAIRness for people would be difficult to measure objectively, as it would often depend on the experience and prior-knowledge of the individual attempting to find and access the data. We further agreed on the qualities that a FAIR metric should exhibit. A good metric should be: Clear: anyone can understand the purpose of the metric Realistic: it should not be unduly complicated for a resource to comply with the metric Discriminating: the metric should measure something important for FAIRness; distinguish the degree to which that resource meets that objective; and be able to provide instruction as to what would maximize that value Measurable: the assessment can be made in an objective, quantitative, machine-interpretable, scalable and reproducible manner, ensuring transparency of what is being measured, and how. Universal: The metric should be applicable to all digital resources. The goal of this working group was to derive at least one metric for each of the FAIR sub-principles that would be universally applicable to all digital resources in all scholarly domains. We recognized, however, that what is considered FAIR in one community may be quite different from the FAIRness requirements or expectations in another community – different community norms, standards, and practices make this a certainty. As such, our approach took into account that the metrics we derived would eventually be supplemented by individual community members through the creation of domain-specific or community-specific metrics. With this in mind, we developed (and utilized) a template for the creation of metrics (Table 1), that we suggest should be followed by communities who engage in this process. The outcome of this process was 14 exemplar universal metrics covering each of the FAIR sub-principles (the short names of the metrics are in brackets in the following description). The metrics request a variety of evidence from the community, some of which may require specific new actions. For instance, digital resource providers must provide a publicly accessible document(s) that provides machine-readable metadata (FM-F2, FM-F3) and details their plans with respect to identifier management (FM-F1B), metadata longevity (FM-A2), and any additional authorization procedures (FM-A1.2). They must ensure the public registration of their identifier schemes (FM-F1A), (secure) access protocols (FM-A1.1), knowledge representation languages (FM-I1), licenses (FM-R1.1), provenance specifications (FM-R1.2). Evidence of ability to find the digital resource in search results (FM-F4), linking to other resources (FM-I3), FAIRness of linked resources (FM-I2), and meeting community standards (FM-R1.3) must also be provided. The current metrics are available for public discussion at the FAIR Metrics GitHub, with suggestions and comments being made through the GitHub comment submission system (https://github.com/FAIRMetrics). They are free to use for any purpose under the CC0 license. Versioned releases will be made to Zenodo as the metrics evolve, with the first release already available for download 5 . We performed an evaluation of these preliminary metrics by inviting a variety of resources to participate in a self-evaluation, where each metric was represented by one or more questions. Nine individuals/organizations responded to the questionnaire, where we emphasized that the objective was not to evaluate their resource, but rather, to evaluate the legitimacy, clarity, and utility of the metrics themselves. This process made it clear that certain metrics (and in some cases, the FAIR Principle underlying it) were not always well-understood. The questionnaire, responses, and evaluation are available in the Zenodo deposit 5 , and a discussion around the responses, what constitutes a "good" answer, and how to quantitatively evaluate an answer, is ongoing, and open to the public on GitHub. Finally, we envision a framework for the automated evaluation of metrics, leveraging on a core set of existing work and resources that will progressively become part of an open ecosystem of FAIR-enabled (and enabling) tools. Each metric will be self-describing and programmatically executable using the smartAPI 6 specification, an initiative that extends on the OpenApi specification with semantic metadata. FAIRsharing 7 will provide source information on metadata, identifier schemas and other standards, which are core elements to many metrics. A “FAIR Accessor” 8 will be used to publish groups of metrics together with metadata describing, for example, the community to which this set of metrics should be applied, the author of the metrics set, and so on. An application will discover an appropriate suite of metrics, gather the information required by each metric’s smartAPI (through an automated mechanism or through a questionnaire), and then execute the metric evaluation. The output will be an overall score of FAIRness, a detailed explanation of how the score was derived (inputs/outputs for each metric) and some indication of how the score could be improved. Anyone may run the metrics evaluation tool in order to, for example, guide their own FAIR publication strategies; however, we anticipate that community stakeholder organizations and other agencies may also desire to run the evaluation over critical resources within their communities, and openly publish the results. For example, FAIRsharing will also be one of the repositories that will store, and make publicly available, FAIRness grade assessments for digital resources evaluated by our framework, using the core set of metrics. Measurements of FAIRness are, in our opinion, tangential to other kinds of metrics, such as measurements of openness 9 or measurements of reuse or citation. While we appreciate the added value that open data provides, we have made it clear that openness is not a requirement of FAIRness 10 , since there are data that cannot be made public due to privacy or confidentiality reasons. Nevertheless, these data can reach a high level of FAIRness by, for example, providing public metadata describing the nature of the data source, and by providing a clear path by which data access can be requested. With respect to reuse and citation, we believe that increasing the FAIRness of digital resources maximizes their reuse, and that the availability of an assessment provides feedback to content creators about the degree to which they enable others to find, access, interoperate-between and reuse their resources. We note, however, that the FAIR-compliance of a resource is distinct from its impact. Digital resources are not all of equal quality or utility, and the size and scope of their audience will vary. Nevertheless, all resources should be maximally discoverable and reusable as per the FAIR principles. While this will aid in comparisons between them, and assessment of their quality or utility, we emphasize that metrics that assess the popularity of a digital resource are not measuring its FAIRness. With this in-mind, and with a template mechanism in-place to aid in the design of new metrics, we now open the process of metrics creation for community participation. All interested stakeholders are invited to comment and/or contribute via the FAIR Metrics GitHub site. Additional information How to cite this article: Wilkinson, M. D. et al. A design framework and exemplar metrics for FAIRness. Sci. Data 5:180118 doi: 10.1038/sdata.2018.118 (2018). Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
                Bookmark

                Author and article information

                Journal
                Data Intelligence
                Data Intelligence
                MIT Press - Journals
                2641-435X
                January 2020
                January 2020
                : 2
                : 1-2
                : 10-29
                Affiliations
                [1 ]Leiden University Medical Center, Leiden, 2333 ZA, The Netherlands
                [2 ]Institute of Data Science, Maastricht University, Universiteitssingel 60, Maastricht 6229 ER, The Netherlands
                [3 ]Department of Computer Science, The University of Manchester, Oxford Road, Manchester M13 9PL, UK
                [4 ]Oxford e-Research Centre, Department of Engineering Sciences, University of Oxford, Oxford OX13PJ, UK
                [5 ]School of Chemistry, Faculty of Engineering and Physical Sciences, University of Southampton, SO17 1BJ, UK
                [6 ]Amsterdam UMC, University of Amsterdam, Amsterdam 1000 GG, The Netherlands
                [7 ]European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, UK
                [8 ]Harvard University, Cambridge, Massachusetts 02138, USA
                [9 ]Department of Bioinformatics – BiGCaT, NUTRIM, Maastricht University, Maastricht 6229 ER, The Netherlands
                [10 ]Conceptual and Cognitive Modeling Research Group (CORE), Free University of Bozen-Bolzano, Bolzano 39100, Italy
                [11 ]Aalborg University, Aalborg DK-9220, Denmark
                [12 ]Insight Centre for Data Analytics, National University of Ireland Galway, H91 TK33, Ireland
                [13 ]Centre for Digital Scholarship, Leiden University Libraries, Leiden, 2333 ZA, The Netherlands
                [14 ]Department of Computer Science, Vrije Universiteit Amsterdam, De Boelelaan 11051081 HV Amsterdam, The Netherlands
                [15 ]Dutch Techcentre for Life Sciences (DTL), Utrecht, The Netherlands
                [16 ]SURF, Utrecht 3511 EP, The Netherlands
                [17 ]Keith G Jeffery Consultants, Faringdon, UK
                [18 ]Castor EDC, Paasheuvelweg 25, Wing 5D, 1105 BP, Amsterdam, The Netherlands
                [19 ]San Diego Supercomputer Center, University of California San Diego, La Jolla, California 92093, USA
                [20 ]Learning and Research Resources Centre (CRAI), Universitat de Barcelona, 08007 Barcelona, Spain
                [21 ]Environment Agency Austria, A-1090 Vienna, Austria
                [22 ]University of Notre Dame, 75004 Paris, France
                [23 ]Health Research Board (HRB), Dublin 2, DO2 H638, Ireland
                [24 ]Liacs Institute of Advanced Computer Science, Leiden University, 2311 GJ Leiden, The Netherlands
                [25 ]Czech Technical University in Prague, Faculty of Information Technology (FIT CTU), 160 00 Prague 6, Czech Republic
                [26 ]GO FAIR International Support & Coordination Office (GFISCO), Leiden, The Netherlands
                [27 ]Harvard Catalyst | Clinical and Translational Science Center, Boston, MA 02115, USA
                [28 ]US National Academy of Sciences, Washington DC 20418, USA
                [29 ]Micelio, Ekeren, Antwerp, Belgium
                [30 ]Deutsches Klimarechenzentrum, Bundesstrasse 45a, 20146 Hamburg, Germany
                [31 ]Center for Plant Biotechnology and Genomics UPM-INIA, Madrid 28040, Spain
                [32 ]Max Planck Computing and Data Facility, Gießenbachstraße 2, 85748 Garching, Germany
                [33 ]Leiden Center for Data Science, 2311 EZ Leiden, The Netherlands
                Article
                10.1162/dint_r_00024
                a129c8fb-ed4e-45fc-bec8-09ed2d0b55e0
                © 2020
                History

                Comments

                Comment on this article