3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Federated discovery and sharing of genomic data using Beacons

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          To the Editor The Beacon Project (https://github.com/ga4gh-beacon/) is a Global Alliance for Genomics & Health (GA4GH) 1 initiative that enables genomic and clinical data sharing across federated networks. The project is working toward developing regulatory, ethics and security guidance to ensure proportionate safeguards for distribution of data according to the GA4GH-developed “Framework for Responsible Sharing of Genomic and Health-Related Data” 2 . Here we describe the Beacon protocol and how it can be used as a model for the federated discovery and sharing of genomic data. A Beacon is defined as a web-accessible service that can be queried for information about a specific allele. A user of a Beacon can pose queries of the form “Have you observed this nucleotide (e.g., C) at this genomic location (e.g., position 32,936,732 on chromosome 13)?” to which the Beacon responds with either “yes” or “no.” In this way, a Beacon allows allelic information of interest to be discovered by a remote searcher with no reference to a specific sample or patient, thereby mitigating privacy risks. In principle, allelic information from any source (or species) can be distributed through a Beacon. For example, a Beacon may serve data from case-level observations, such as genetic variants identified from sequenced samples, or from annotation resources such as variant–disease associations curated from scientific literature. Along with a “yes” response, a Beacon may optionally disclose metadata, including allele frequencies, pathogenicity scores and associated phenotypes, associated with the queried allele. Access to Beacons is securable through institutional systems for authentication and authorization (for example, ELIXIR AAI), allowing hosts to enforce proportionate safeguards for datasets that may be sensitive and consented for use only by trusted individuals and/or for specific purposes. The Beacon Project is demonstrating the willingness of international organizations to work together to define standards for, and actively engage in, genomic data sharing. Several organizations have ‘lit’ (i.e., implemented) a Beacon, and these have been assembled into a single searchable network. In the years since the project’s inception, over 100 Beacons have been lit by 40 organizations serving over 200 datasets. The datasets served through Beacons are searchable individually or in aggregate—for instance, via the Beacon Network (https://beacon-network.org), a federated search engine across the world’s beacons. Beacons are a general-purpose protocol for genomics data discovery and have been lit by both large and small organizations, as well as by individuals. This has made available datasets collected from large-scale population sequencing efforts (for example, 1000 Genomes) 3 , clinical diagnostic settings, in silico predictions (for example, PolyPhen-2) 4 , expertly curated or crowd-sourced databases, scientific literature (for example, the Human Genome Mutation Database) 5 and variant curation efforts (for example, ClinVar) 6 . The International Cancer Genome Consortium 7 Beacon shares case-level somatic variant observations from over 60 cancer subtypes; the PhenomeCentral 8 Beacon shares observations from hundreds of clinical cases of undiagnosed and rare genetic diseases; and the BRCA Exchange (https://brcaexchange.org/) Beacon distributes consensus classifications for variants in BRCA1 and BRCA2 cataloged by the ENIGMA Consortium 9 , as well as variants collected from other resources as part of the GA4GH BRCA Exchange (https://brcaexchange.org/). The ELIXIR hub (https://elixir-europe.org/) is also integrating Beacon to connect geographically distributed data centers and unify their data access methodologies. This will enable aggregate sharing of allelic observations between sites, a feature that is not yet available through its services. With continued adoption, Beacons will produce a large network of globally searchable genomics datasets that have the potential to unlock new genomics-derived discoveries and applications in medicine. Beacon protocol Many former systems for genomic data sharing have followed a centralized model, wherein data generators deposit information into a single repository, such as the Sequence Read Archive (SRA) 10 . This model requires data generators to transfer whole copies of datasets over the internet, which will become inefficient and expensive as the rate of genomic data acquisition increases. An alternative, federated model for data sharing 1 requires organizations to host data independently and to interoperate via an agreed-upon technical language. This model removes the inefficiencies of large data transfers and gives host organizations more control over data privacy, security and representation. For maximal interoperability, a Beacon is designed to be a communication layer that is compatible with any underlying representation of alleles or their annotations. For example, the GA4GH develops a data representation format for genomic variants and annotations, but in practice these data types may be stored in other formats as well (for example, VCF files or relational databases). Sharing through Beacon is notably different from sharing fully descript data representations for genomic variants (for example, VCF) or annotations (for example, GFF). The Beacon protocol considers levels of data aggregation and obfuscation that can be added onto raw data representations (such as VCF) to convey useful information without explicitly referring to specific samples or individuals. With these features in mind, the Beacon protocol was designed to be: Simple: Beacons can be implemented on top of any underlying variant or variant annotation data store. Federated: Beacons can be lit and maintained by individual organizations and assembled into a distributed network. General purpose: Beacons can be used to distribute any allelic dataset, including case-level observations or other annotations. Aggregative: Beacons provide a boolean answer to whether an allele was observed, possibly aggregated across an entire population, and therefore support deidentification in a way that sharing via VCF files does not. Securable: Beacon access can be restricted using institutional security protocols, and authorization schemes can be implemented to respect conditions consented to by patients and/or data owners. The Beacon API (represented as a RESTful web application) provides a technical specification that a Beacon server must implement. The specification is open-source and available online at https://github.com/ga4gh-beacon/specification. A Beacon has two available functions: the first lists information about the Beacon, including descriptions of the host organization and specific datasets that it serves; the second queries for the existence of information about specific alleles. Alleles are specified with chromosomal coordinates in addition to reference and alternate bases. Much as in their use in VCF, reference and alternative bases can be used together to specify exact matches for single nucleotide variants (SNVs) and small insertions or deletions. A Beacon responds either “yes” or “no” to signal whether the dataset(s) it serves have information about the queried allele. In the affirmative, a Beacon may optionally disclose metadata describing the observations or annotations associated with the queried allele. An example query and response is shown in Supplementary Fig. 1. Reference implementation To simplify the process of lighting a Beacon, a free, open-source ‘reference implementation’ of the latest specification has been developed. This implementation can create a public Beacon from a set of VCF files. It may be deployed locally or in a cloud-based environment maintained by a third-party provider (for example, Amazon, Google or Microsoft). Documentation and links to download and run the Beacon reference implementation are available (https://github.com/ga4gh-beacon/). Third-party organizations, such as Cafe Variome, DNAstack and the European Genome-phenome Archive (EGA), also support the ability to light Beacons from genetic variation datasets stored in those systems. Beacon security design In principle, access to Beacons can be secured through any system of authentication or authorization, at the discretion of the host organization. The GA4GH is promoting different levels of data access (open, registered, and controlled) for convenience and for compatibility across its projects. Each so-called ‘access tier’ has distinct visibility and requirements for authorization. For example, ‘open access’ Beacons are accessible to anonymous users of the internet, whereas ‘registered access’ Beacons are accessible to registered users (for example, bona fide researchers and clinicians) who have agreed to a set of conditions of data use 11 . A Beacon may support one or more access tiers to provide progressive disclosure of increasingly sensitive information (for example, patient phenotypes and clinical information) as users pass through more stringent authentication and authorization checks. For example, tiered access makes it possible for organizations to allow anonymous users to discover the existence of an allelic observation, without the Beacon disclosing more information about it until users identify themselves. The ability for organizations to offer minimal data discovery up front can save substantial time and effort in data access applications when data might not contain relevant data points. Beacon’s ability to reveal different information at specific access tiers affords genomic data stewards options for distributing allelic information, ranging from fully public to private. Access can be controlled using established authentication and authorization protocols (for example, OpenID Connect and OAuth2.0) to enforce proportionate safeguards for datasets that may be sensitive and/or consented for use only by trusted individuals for specific purposes. Attribute disclosure attacks and reidentification The “yes” response from a Beacon signals the presence of an allele in a dataset comprising possibly many individuals’ genotypes, thereby mitigating risks associated with reidentifying specific individuals. Independent of their technical implementation, Beacon reidentification attempts require prior knowledge of genomic sequence data from the individual (or that of a close relative); they are arguably preceded by more harmful compromises to privacy. However, reidentification can pose additional risks if sensitive attributes about the individual can be inferred from Beacons (for example, HIV status or mental health condition). Such attacks have been characterized as “attribute disclosure attacks using DNA” (ADAD) 12 . Querying a Beacon for many variants known to exist in a person’s genome could lead to confirmation of that person’s inclusion in a given database, potentially revealing sensitive information about that individual. The ability to reidentify individuals has been examined previously 13 and recently in the context of Beacons 14 . The power to reidentify an individual whose genotypes are reflected through a Beacon depends on the number of individuals whose data is served, the allele frequency distribution of the pool, the scope of allowed queries (for example, exome versus genome), the type of DNA source (for example, normal tissue versus cancer sample) and the number of times a Beacon is queried. Models for population allele frequencies can be leveraged to reduce the number of queries required in such an attempt, but reidentification is still possible without using allele frequencies if a Beacon can be queried a large number of (for example, 10,000) times. Risk mitigation schemes User agreements, data use policies and technical enforcement of usage quotas can be established to limit the possibility of reidentification and ADAD through Beacons. Organizations are advised to specify terms of use that explicitly prohibit reidentification attempts through the service. When the risk of ADAD is considered too high for data to be distributed publicly, data stewards are encouraged to implement secured access. Compared with public-access tiers, secured-access tiers (either registered or controlled) impose extra social and/or legal disincentives that can help prevent service misuse. Beacon operators may further specify consent-based data use conditions from a structured set of Consent Codes to impose restrictions indicated by consent of research participants. These Consent Codes, which are general purpose and can be used by genomics data stewards, including Beacon operators, were designed with the purpose of supporting maximum data use and integration while respecting consent permissions 15 . The current set of Consent Codes is provided in Supplementary Table 1. The ethical, legal and social status of health-related data that are typically considered sensitive in international policy and laws is being examined to provide guidance in aggregating Beacons and in implementing tiered protection of Beacon attributes based on sensitivity 16 . This guidance aims to enable consistent and proportionate provision of data protection for data that are considered more sensitive by individuals and society. Data stewards should consider the sensitivity of attributes used in describing their Beacons, as well as those in the data itself. Technical provisions can also be used to reduce the statistical power of reidentification attempts. Individual Beacons can be combined to form a single, aggregate Beacon, and direct access to participating Beacons can be blocked. Aggregate beacons contain more data points than any of the individual Beacons while obscuring the origin of the data. As an example, a publicly accessible Beacon named Conglomerate has been lit as an aggregate of multiple independent Beacons. An information budgeting approach can also be used to thwart reidentification attempts 17 , which rely on accumulating evidence from many queries for alleles carried by a specific individual. The power to reidentify an individual using this technique varies inversely with the frequency of the alleles being queried (i.e., very rare alleles are more revealing than common alleles). By metering the cumulative information disclosure for individuals, Beacons can be configured to restrict access before reidentification is possible within a desired level of statistical confidence. Beacon is a general-purpose protocol for genomics data discovery, and as such can be used to distribute allelic information from various origins, including sequence observations from patients with known (for example, the International Cancer Genome Consortium) 7 or unknown (e.g., PhenomeCentral) 8 diseases, population studies (for example, 1000 Genomes) 3 , in silico predictions (for example, PolyPhen-2) 4 , expertly curated or crowdsourced databases (for example, BRCA Exchange and ClinVar) 6 , and scientific literature (for example, the Human Genome Mutation Database) 5 . Additional Beacon implementations are ongoing in Europe, mainly through the ELIXIR Beacon project. The deployment of Beacons for select use cases is described below. Matchmaking A major obstacle to discovering the causes of rare diseases is sample size. A single affected family can be enough to identify one or more compelling candidate variants, but pinpointing causal genetic variants frequently requires examining unrelated cases with a variant in the same gene and similar phenotypic presentations. Recently, patient matchmaking has been formalized through efforts such as the Matchmaker Exchange (MME) 18 , in which users who contribute a case to a database within the federated network can find similar cases in other databases within the network. MME is a secured-access system, requiring that only authorized databases and users can contribute and exchange patient profiles for matching. However, this inherently limits the discoverability of the data, which may dissuade some users having candidate genes or variants they want to match. In addition to implementing the MME API 19 for patient matchmaking, several organizations within the MME have lit Beacons to serve aggregate views of their clinical datasets more publicly. This allows clinicians with candidate variants to quickly search for existing matches within the MME. Sequencing initiatives and archives Large-scale sequencing initiatives, such as the 100,000 Genomes Project 20 conducted by Genomics England and the Precision Medicine Initiative 21 , promise to generate vast volumes of genotypic and associated health information. Data from these projects, once shared, help researchers make inferences on the genetic determinants of disease by way of comparative analysis and association studies. The 1000 Genomes Project 3 , NHLBI Grand Opportunity Exome Sequence Project (https://esp.gs.washington.edu/drupal/), and Exome Aggregation Consortium 22 are exemplar large-scale initiatives that have shared genotypes from diverse populations through Beacons. As the number and scale of population sequencing efforts expand, a more accurate depiction of global sequence diversity will be available in aggregate through Beacons and the Beacon Network. In addition, many of the largest genomic archives, such as dbGaP 22 , the European Genome-phenome Archive (https://www.ebi.ac.uk/ega/home) and the European Variation Archive (http://www.ebi.ac.uk/eva), have provided access to variation data through Beacons for some or all of their datasets. These Beacons collectively provide widespread discoverability across a large amount of data. Many of these resources are continually growing with new submissions and thus provide added value for data depositors by simplifying data distribution and unifying their consumption. Beacon Network Beacon represents a simple protocol that, like internet protocols such as HTTP, describes a method for data discovery and exchange between distributed, collaborative systems. Toward developing an ‘internet for genomics’, it is useful to establish a network of protocol adopters and an efficient mechanism for searching across it. The Beacon Network is a directory and search engine for Beacons. Although individual Beacons answer the question “Have you observed this allele?”, the Beacon Network answers the question “Who has observed this allele?”. The Beacon Network serves as a powerful, convenient and real-time genomic data distribution channel through which users can discover the existence of alleles of interest and be directed to host organizations who have observed them. A schematic of the Beacon Network as a global federated network for genomic information discovery is shown in Fig. 1. The Beacon Network is accessible either through its website or programmatically through an API, and enables fast, simultaneous search of hundreds of datasets from hundreds of thousands of individuals already served through Beacons worldwide. Beacons can be freely registered to the Beacon Network and can be searched independently or in aggregate with other connected Beacons. The Beacon Network has received over 1.5 million queries in the three years since its launch. The value of datasets connected to the Beacon Network increases as more Beacons join, particularly for comparative applications like rare disease and donor matching. Conclusions and perspectives The first version of the Beacon Project has validated the feasibility of a globally federated system for genomic data sharing. The conceptual and technical simplicity of the discovery question, “Have you observed this allele?”, enabled rapid and widespread adoption, and this has served to provide practical feedback for the GA4GH to continue to advance its best practices by holistically addressing regulatory, security and technical aspects of global genomics data sharing. However, the narrow focus of the initial Beacon question limits its utility to support other closely related use cases, and successive iterations of the protocol are planned to enable coverage of these. Future extensions to the Beacon protocol may include the following: Support for discovering complex genomic alterations, including copy number variations (CNVs) and somatic copy number alterations (CNAs), which are major contributors to both inter-individual variation and disease susceptibility and prominent features of the oncogenomic mutation landscape; Integration of non-genomics data in queries, including the ability to discover similar cases on the basis of associated metadata; Support for quantitative attributes in responses (for example, allele frequencies) to facilitate statistical analyses that combine information disclosed through multiple Beacons; Handoff to services by which users may access additional information about a queried variant. The development of data-rich extensions to the Beacon protocol will leverage the expertise of GA4GH members and stakeholders to iteratively design and evaluate the technical, privacy and security considerations in evolving Beacons to enable unprecedented access to genomics and clinical datasets through a global, federated ecosystem. Supplementary Material Supplementary information is available for this paper at https://doi.org/10.1038/s41587-019-0046-x. Supplementary material

          Related collections

          Most cited references8

          • Record: found
          • Abstract: found
          • Article: not found

          ENIGMA--evidence-based network for the interpretation of germline mutant alleles: an international initiative to evaluate risk and clinical significance associated with sequence variation in BRCA1 and BRCA2 genes.

          As genetic testing for predisposition to human diseases has become an increasingly common practice in medicine, the need for clear interpretation of the test results is apparent. However, for many disease genes, including the breast cancer susceptibility genes BRCA1 and BRCA2, a significant fraction of tests results in the detection of a genetic variant for which disease association is not known. The finding of an "unclassified" variant (UV)/variant of uncertain significance (VUS) complicates genetic test reporting and counseling. As these variants are individually rare, a large collaboration of researchers and clinicians will facilitate studies to assess their association with cancer predisposition. It was with this in mind that the ENIGMA consortium (www.enigmaconsortium.org) was initiated in 2009. The membership is both international and interdisciplinary, and currently includes more than 100 research scientists and clinicians from 19 countries. Within ENIGMA, there are presently six working groups focused on the following topics: analysis, clinical, database, functional, tumor histopathology, and mRNA splicing. ENIGMA provides a mechanism to pool resources, exchange methods and data, and coordinately develop and apply algorithms for classification of variants in BRCA1 and BRCA2. It is envisaged that the research and clinical application of models developed by ENIGMA will be relevant to the interpretation of sequence variants in other disease genes. © 2011 Wiley Periodicals, Inc.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Routes for breaching and protecting genetic privacy.

            We are entering an era of ubiquitous genetic information for research, clinical care and personal curiosity. Sharing these data sets is vital for progress in biomedical research. However, a growing concern is the ability to protect the genetic privacy of the data originators. Here, we present an overview of genetic privacy breaching strategies. We outline the principles of each technique, indicate the underlying assumptions, and assess their technological complexity and maturation. We then review potential mitigation methods for privacy-preserving dissemination of sensitive data and highlight different cases that are relevant to genetic applications.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Framework for responsible sharing of genomic and health-related data

              Preamble The sharing of genomic and health-related data for biomedical research is of key importance in ensuring continued progress in our understanding of human health and wellbeing. The challenges raised by international, collaborative research require a principled but nevertheless practical Framework that brings together regulators, funders, patient groups, information technologists, industry, publishers, and research consortia to share principles about data exchange. Such a Framework will facilitate responsible research conduct. This Framework is developed under the auspices of the Global Alliance for Genomics and Health. Its mission is to accelerate progress in human health by helping to establish a common Framework of harmonized approaches to enable effective and responsible sharing of genomic and clinical data and to catalyze data sharing projects that drive and demonstrate the value of data sharing. This Framework provides guidance for the responsible sharing of human genomic and health-related data, including personal health data and other types of data that may have predictive power in relation to health. In particular, it highlights, and is guided by, Article 27 of the 1948 Universal Declaration of Human Rights. Article 27 guarantees the rights of every individual in the world “to share in scientific advancement and its benefits” (including to freely engage in responsible scientific inquiry), and at the same time “to the protection of the moral and material interests resulting from any scientific…production of which [a person] is the author”. (As set out in Appendix 1, many other international conventions and national laws, regulations, codes and policies also guide responsible data sharing behavior). This Framework is guided by the human rights of privacy, non-discrimination and procedural fairness. At the same time, it considers all human rights principles relevant, complementary and interrelated, founded as they are on respect for human dignity. Since science proceeds only with the broad support of society, respect for all persons is a primary driver underlying all other derived principles. In particular, this Framework establishes a set of foundational principles for responsible research conduct and oversight of research data systems in the realm of genomic and health-related data sharing. It interprets the right of all people to share in the benefits of scientific progress and its applications as being the duty of data producers and users to engage in responsible scientific inquiry and to access and share genomic and health-related data across the translation continuum, from basic research through practical applications. It recognizes the rights of data producers and users to be recognized for their contributions to research, balanced by the rights of those who donate their data. In addition to being founded on the right of all citizens in all countries to the benefits of the advancements of science, and on the right of attribution of scientists, it also reinforces the right of scientific freedom. The value of this Framework is that it: offers political and legal dimensions that reach beyond the moral appeals of bioethics and provides a more robust governance framework for genomic and health-related data sharing; speaks to groups and institutions, not just individuals; stresses the progressive realization of duties; and urges action by governments, industry, funders, publishers, and researchers to create an international environment for responsibly sharing data. This Framework will be elaborated by subsequent Policies (Appendix 2) on particular issues such as ethical governance, consent, privacy and security. The Framework and its subsequent Policies should be used in projects around the world (whether Global Alliance “inspired” or not) such that they become the tools that approval entities, recognized by different jurisdictions, will turn or refer to for guidance. Recognizing diversity of legal and ethical approaches and being responsive to emerging issues, both this Framework and its Policies are intended to provide leadership in this domain for wider discussion. Purpose and interpretation 1. Purpose The purpose of this Framework is to provide a principled and practical framework for the responsible sharing of genomic and health-related data. Its primary goals are to: i. Protect and promote the welfare, rights, and interests of individuals from around the world in genomic and health-related data sharing, particularly those who contribute their data for biomedical research; ii. Complement laws and regulations on privacy and personal data protection, as well as policies and codes of conduct for the ethical governance of research; iii. Foster responsible data sharing and oversight of research data systems; iv. Establish a framework for greater international data sharing, collaboration and good governance; v. Serve as a dynamic instrument that can respond to future developments in the science, technology, and practices of genomic and health-related data sharing; vi. Serve as a tool for the evaluation of responsible research by research ethics committees and data access committees; and vii. Provide overarching principles to be respected in developing legally-binding tools such as data access agreements. 2. Interpretation Without ascribing legal meaning, this Framework should be interpreted in good faith and is to be understood as a whole. The Foundational Principles and Core Elements are to be understood as complementary and interrelated, as appropriate and relevant in different contexts, countries and cultures. This Framework will be supported by Policies for guidance in particular issues such as, but not limited to, ethical governance, privacy and security, and consent. For the purposes of this Framework, “data sharing” includes data transfer or data exchange between data users, or where data are made available to secondary researchers, either openly or under specified access conditions. Application This Framework is intended for all entities or individuals providing, storing, accessing, managing or otherwise using genomic and health-related data, including data donors, users, and producers. This includes, but is not limited to, researchers, research participants and patient communities, publishers, research funding agencies, data protection authorities, hospitals, research ethics committees, industry, ministries of health, and public health organizations. Foundational principles The Foundational Principles of this Framework guide the responsible sharing of genomic and health-related data. They also facilitate compliance with the obligations and norms set by international and national law and policies. Foundational principles for responsible sharing of genomic and health-related data Respect Individuals, Families and Communities Advance Research and Scientific Knowledge Promote Health, Wellbeing and the Fair Distribution of Benefits Foster Trust, Integrity and Reciprocity Core elements of responsible data sharing It is good practice for those involved in genomic and health-related data sharing to have core elements of responsible data sharing in place. The following Core Elements of the Framework aid in the interpretation of the Foundational Principles to individuals and organizations involved in the sharing of genomic and health-related data. The Core Elements should be interpreted in a proportionate manner that acknowledges different levels of risk and community cultural practices. This Framework applies to use of data that have been consented to by donors (or their legal representatives) and/or approved for use by competent bodies or institutions in compliance with national and international laws, general ethical principles, and best practice standards that respect restrictions on downstream uses. Endorsement of the Framework does not preclude the development of particular guidance via Policies for specific populations (e.g. children) or issues (e.g. ethical governance, privacy and security, and consent). Core elements of responsible data sharing Transparency Develop clearly defined and accessible information on the purposes, processes, procedures and governance frameworks for data sharing. Such information should be presented in a way that is understandable and accessible in both digital and non-digital formats. Provide clear information on the purpose, collection, use and exchange of genomic and health-related data, including, but not limited to: data transfer to third parties; international transfer of data; terms of access; duration of data storage; identifiability of individuals and data and limits to anonymity or confidentiality of data; communication of results to individuals and/or groups; oversight of downstream uses of data; commercial involvement; proprietary claims; and processes of withdrawal from data sharing. Implement procedures for fairly determining requests for data access and/or exchange. Accountability Put in place systems for data sharing that respect this Framework. Track the chain of data access and/or exchange to its source. Develop processes to identify and manage conflicts of interest. Implement mechanisms for handling complaints related to data misuse; for identifying, reporting and managing breaches; and for instituting appropriate sanctions. Engagement Develop mechanisms to enable citizens to make meaningful contributions to biomedical research and to partake in deliberation on how these contributions can be respected. Facilitate deliberation about the wider societal implications of genomic and health-related data sharing among all stakeholders, especially citizens. Data quality and security Store and process the data collected, used and transferred in a way that is accurate, verifiable, unbiased, proportionate, and current, so as to enhance their interoperability and replicability and also preserve their long-term searchability and integrity. Ensure feedback mechanisms on the utility, quality, security, and accuracy of data, and their annotations, with a view to improving quality and interoperability and appropriate re-use by others. Establish proportionate data security measures that mitigate the risk of unauthorized access, data loss and misuse. Understand the issues related to lawful requests for data based on law enforcement, public health, or national security concerns. Privacy, data protection and confidentiality Comply with applicable privacy and data protection regulations at every stage of data sharing, and be in a position to provide assurances to citizens that confidentiality and privacy are appropriately protected when data are collected, stored, processed, and exchanged. Privacy and data protection safeguards should be proportionate to the nature and use of the data, whether identifiable, coded or anonymized. Forego any attempt to re-identify anonymized data unless where expressly authorized by law. Risk-benefit analysis Consider the realistic harms and benefits of data sharing on and with individuals, families and communities, including opportunity costs associated with both sharing and not sharing data. Potential realistic benefits may include development of new scientific knowledge and applications, enhanced efficiency, reproducibility and safety of research projects or processes, and more informed decisions about health care. Potential realistic harms may include invasions of privacy or breach of confidentiality and invalid conclusions about research projects. Conduct data sharing with a view towards minimizing harms and maximizing benefits to not just those who contribute their data, but also to society and health care systems as a whole, particularly where data pertains to disadvantaged people. Benefits arising from data sharing may not be uniformly distributed throughout communities around the world and may depend on the usability of data within a specified context, national priorities, as well as a specific community’s concern about health and interpretation of wellbeing. Undertake a proportionate assessment of the benefits and risks of harm in data sharing, which is periodically monitored according to the reasonable foreseeability of such harms and benefits. Such an assessment may also incorporate mechanisms that track subsequent harms, should they materialize, so as to help inform future policy. Recognition and attribution Design systems of data sharing with a view towards recognition and attribution that are meaningful and appropriate to the medium or discipline concerned and which provide due credit and acknowledgement of all who contributed to the results. Extend recognition and attribution both to primary purposes and, as appropriate, to secondary or downstream uses and applications. All parties should act in good faith to ensure that any connections to original sources of data are maintained where appropriate, to the extent permissible by law. Sustainability Ensure, where appropriate, the sustainability of the data generated for future use, through both archiving and using appropriate identification and retrieval systems, and through critical appraisal of the mechanisms and systems used for sharing genomic and health-related data. Education and training Dedicate education and training resources so as to advance data sharing and data management and to constantly improve data quality and integrity. Education and training resources should lso be dedicated to: fostering and maintaining good records about the effects and impact of data sharing; raising awareness about national health priorities and distribution of health services; building capacity and data sharing infrastructure in countries; and, working towards the building of an evidence base about the advantages and potential limitations of data sharing. Accessibility and dissemination Make reasonable efforts to maximize the accessibility of data for research through lawful and proportionate data sharing. Promote collaborative partnerships and data sharing that can generate maximum benefit, along with the harmonization of deposit, management and access procedures and use as a means to promote accessibility. Seek to make data and research results widely available, including through publication and digital dissemination, whether positive, negative or inconclusive, depending on the nature and use of the data. Dissemination of data and research results should be conducted in a way that both promotes scientific collaboration, reproducibility and broad access to data, and yet minimizes obstacles to data sharing while minimizing harms and maximizing benefits to individuals, families and communities. Implementation mechanisms and amendments This Framework should be adopted by organizations and bodies involved in genomic and health-related data sharing. Organizations and bodies adhering to this Framework should take all reasonable and appropriate measures, whether of a regulatory, contractual, administrative or other character, to give effect to the Foundational Principles and Core Elements set out in this Framework in accordance with the international law of human rights and should, by means of all reasonable and appropriate measures, promote their implementation. Any persons, organizations or bodies adhering to this Framework may propose one or more amendments to the present Framework by communicating the amendments to the Regulatory and Ethics Working Group of the Global Alliance for Genomics and Health (the “REWG”). The REWG shall publicly circulate such amendments for comments and possible inclusion in this Framework. The REWG, in collaboration with biomedical, patient advocacy, and ethical and policy organizations and committees, will track the adoption of this Framework and its application through subsequent Policies. It will also routinely review its provisions, be aware of advances in basic research and technology, and ethical and legal developments, and attempt to ensure that this Framework is fit for purpose. Appendix 1 Foundational Human Rights Instruments * Universal Declaration of Human Rights (UN 1948) (Article 27) * International Covenant on Economic, Social and Cultural Rights (UN 1966) (Article 15) Ethical and Legal Codes and Policies Guiding Data Sharing Behavior Constitution of the World Health Organization (WHO 1946) Bermuda Principles on Human Genome Sequencing (1996) Universal Declaration on the Human Genome and Human Rights (UNESCO 1997) Convention on Human Rights and Biomedicine (Council of Europe 1997) Statement on DNA Sampling: Control and Access (HUGO 1998) Statement on Human Genomic Databases (HUGO Ethics Committee 2002) Declaration of Ethical Considerations regarding Health Databases (WMA 2002) International Ethical Guidelines for Biomedical Research Involving Human Subjects (CIOMS, WHO 2002) Budapest Open Access Initiative (2002) Sharing Data from Large-scale Biological Research Projects: A System of Tripartite Responsibility (Fort Lauderdale Statement, 2003) International Declaration on Human Genetic Data (UNESCO, IBC 2003) European Society of Human Genetics: Data Storage and DNA Banking for Biomedical Research (ESHG 2003) Universal Declaration on Bioethics and Human Rights (UNESCO 2005) Additional Protocol to the Convention on Human Rights and Biomedicine, concerning Biomedical Research (Council of Europe 2005) Recommendation Rec (2006) 4 of the Committee of Ministers to Member States on Research on Biological Materials of Human Origin (Council of Europe 2006) OECD Principles and Guidelines for Access to Research Data from Public Funding (OECD 2007) International Ethical Guidelines for Epidemiological Studies (CIOMS, WHO 2008) Recommendations from the 2008 International Summit on Proteomics Data Release and Sharing Policy (Amsterdam Principles, 2008) Guidelines for Human Biobanks and Genetic Research Databases (OECD 2008, 2009) Toronto Statement on Prepublication Data Sharing (2009) Joint Statement by Funders of Health Research (2011) 2012 Best Practices for Repositories: Collection, Storage, Retrieval and Distribution of Biological Material for Research (ISBER 2012) Responsible Conduct in the Global Research Enterprise: A Policy Report (InterAcademy Council 2012) Declaration of Helsinki (WMA 2013) Guidelines governing the Protection of Privacy and Transborder Flows of Personal Data (OECD 2013) Appendix 2 Figure 1. Figure 1 Global Alliance for Genomics and Health (GA4GH): proposed policy template.
                Bookmark

                Author and article information

                Journal
                9604648
                Nat Biotechnol
                Nat. Biotechnol.
                Nature biotechnology
                1087-0156
                1546-1696
                01 March 2019
                30 August 2019
                05 September 2019
                : 37
                : 3
                : 220-224
                Affiliations
                [1 ]DNAstack, Toronto, Ontario, Canada
                [2 ]Global Alliance for Genomics and Health, Toronto, Ontario, Canada
                [3 ]European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
                [4 ]Centre de Regulació Genòmica, Barcelona, Spain
                [5 ]Centre of Genomics and Policy, Department of Human Genetics, McGill University, Montreal, Quebec, Canada
                [6 ]Department of Genetics, University of Leicester, Leicester, UK
                [7 ]Genecloud, Sunnyvale, CA, USA
                [8 ]ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, UK
                [9 ]Ontario Institute for Cancer Research, Toronto, Ontario, Canada
                [10 ]Genomics Institute, University of California at Santa Cruz, Santa Cruz, CA, USA
                [11 ]Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
                [12 ]SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
                [13 ]CSC – IT Center for Science Ltd, Espoo, Finland
                [14 ]Broad Institute of MIT and Harvard, Cambridge, MA, USA
                [15 ]Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
                [16 ]National Center for Biotechnology Information, US National Library of Medicine, Bethesda, MD, USA
                Author notes
                Article
                EMS84242
                10.1038/s41587-019-0046-x
                6728157
                30833764
                cc2f0696-8d4e-4c35-a68a-837672046db1

                This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                Categories
                Article

                Biotechnology
                Biotechnology

                Comments

                Comment on this article