Inviting an author to review:
Find an author and click ‘Invite to review selected article’ near their name.
Search for authorsSearch for similar articles
18
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Data Safe Havens in health research and healthcare

      review-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation: The data that put the ‘evidence’ into ‘evidence-based medicine’ are central to developments in public health, primary and hospital care. A fundamental challenge is to site such data in repositories that can easily be accessed under appropriate technical and governance controls which are effectively audited and are viewed as trustworthy by diverse stakeholders. This demands socio-technical solutions that may easily become enmeshed in protracted debate and controversy as they encounter the norms, values, expectations and concerns of diverse stakeholders. In this context, the development of what are called ‘Data Safe Havens’ has been crucial. Unfortunately, the origins and evolution of the term have led to a range of different definitions being assumed by different groups. There is, however, an intuitively meaningful interpretation that is often assumed by those who have not previously encountered the term: a repository in which useful but potentially sensitive data may be kept securely under governance and informatics systems that are fit-for-purpose and appropriately tailored to the nature of the data being maintained, and may be accessed and utilized by legitimate users undertaking work and research contributing to biomedicine, health and/or to ongoing development of healthcare systems.

          Results: This review explores a fundamental question: ‘what are the specific criteria that ought reasonably to be met by a data repository if it is to be seen as consistent with this interpretation and viewed as worthy of being accorded the status of ‘Data Safe Haven’ by key stakeholders’? We propose 12 such criteria.

          Contact: paul.burton@ 123456bristol.ac.uk

          Related collections

          Most cited references5

          • Record: found
          • Abstract: found
          • Article: not found

          Routes for breaching and protecting genetic privacy.

          We are entering an era of ubiquitous genetic information for research, clinical care and personal curiosity. Sharing these data sets is vital for progress in biomedical research. However, a growing concern is the ability to protect the genetic privacy of the data originators. Here, we present an overview of genetic privacy breaching strategies. We outline the principles of each technique, indicate the underlying assumptions, and assess their technological complexity and maturation. We then review potential mitigation methods for privacy-preserving dissemination of sensitive data and highlight different cases that are relevant to genetic applications.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            DataSHIELD: taking the analysis to the data, not the data to the analysis

            Background: Research in modern biomedicine and social science requires sample sizes so large that they can often only be achieved through a pooled co-analysis of data from several studies. But the pooling of information from individuals in a central database that may be queried by researchers raises important ethico-legal questions and can be controversial. In the UK this has been highlighted by recent debate and controversy relating to the UK’s proposed ‘care.data’ initiative, and these issues reflect important societal and professional concerns about privacy, confidentiality and intellectual property. DataSHIELD provides a novel technological solution that can circumvent some of the most basic challenges in facilitating the access of researchers and other healthcare professionals to individual-level data. Methods: Commands are sent from a central analysis computer (AC) to several data computers (DCs) storing the data to be co-analysed. The data sets are analysed simultaneously but in parallel. The separate parallelized analyses are linked by non-disclosive summary statistics and commands transmitted back and forth between the DCs and the AC. This paper describes the technical implementation of DataSHIELD using a modified R statistical environment linked to an Opal database deployed behind the computer firewall of each DC. Analysis is controlled through a standard R environment at the AC. Results: Based on this Opal/R implementation, DataSHIELD is currently used by the Healthy Obese Project and the Environmental Core Project (BioSHaRE-EU) for the federated analysis of 10 data sets across eight European countries, and this illustrates the opportunities and challenges presented by the DataSHIELD approach. Conclusions: DataSHIELD facilitates important research in settings where: (i) a co-analysis of individual-level data from several studies is scientifically necessary but governance restrictions prohibit the release or sharing of some of the required data, and/or render data access unacceptably slow; (ii) a research group (e.g. in a developing nation) is particularly vulnerable to loss of intellectual property—the researchers want to fully share the information held in their data with national and international collaborators, but do not wish to hand over the physical data themselves; and (iii) a data set is to be included in an individual-level co-analysis but the physical size of the data precludes direct transfer to a new site for analysis.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              DataSHIELD: resolving a conflict in contemporary bioscience—performing a pooled analysis of individual-level data without sharing the data

              Background Contemporary bioscience sometimes demands vast sample sizes and there is often then no choice but to synthesize data across several studies and to undertake an appropriate pooled analysis. This same need is also faced in health-services and socio-economic research. When a pooled analysis is required, analytic efficiency and flexibility are often best served by combining the individual-level data from all sources and analysing them as a single large data set. But ethico-legal constraints, including the wording of consent forms and privacy legislation, often prohibit or discourage the sharing of individual-level data, particularly across national or other jurisdictional boundaries. This leads to a fundamental conflict in competing public goods: individual-level analysis is desirable from a scientific perspective, but is prevented by ethico-legal considerations that are entirely valid. Methods Data aggregation through anonymous summary-statistics from harmonized individual-level databases (DataSHIELD), provides a simple approach to analysing pooled data that circumvents this conflict. This is achieved via parallelized analysis and modern distributed computing and, in one key setting, takes advantage of the properties of the updating algorithm for generalized linear models (GLMs). Results The conceptual use of DataSHIELD is illustrated in two different settings. Conclusions As the study of the aetiological architecture of chronic diseases advances to encompass more complex causal pathways—e.g. to include the joint effects of genes, lifestyle and environment—sample size requirements will increase further and the analysis of pooled individual-level data will become ever more important. An aim of this conceptual article is to encourage others to address the challenges and opportunities that DataSHIELD presents, and to explore potential extensions, for example to its use when different data sources hold different data on the same individuals.
                Bookmark

                Author and article information

                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                bioinfo
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                15 October 2015
                25 June 2015
                25 June 2015
                : 31
                : 20
                : 3241-3248
                Affiliations
                1Data to Knowledge (D2K) Research Group, University of Bristol, Oakfield House, Oakfield Grove, Clifton, Bristol BS8 2BN, UK,
                2Public Population Project in Genomics and Society (P 3G), Montreal, QC H3A 0G1, Canada,
                3Department of Computer Science, University of Toronto, Sandford Fleming Building, Toronto, ON M5S 3G4, Canada,
                4JK Mason Institute for Medicine, Life Sciences and the Law, School of Law, University of Edinburgh, Old College, South Bridge, Edinburgh EH8 9YL, UK,
                5Department of Health Sciences, University of Leicester, Adrian Building, University Road, Leicester LE1 7RH, UK,
                6School of Epidemiology, Public Health and Preventive Medicine, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada,
                7Center for Genetic Medicine and Surgery, Northwestern University, Rubloff Building, 750 N Lake Shore, Chicago, IL 60611, USA,
                8Department of Public Health and General Practice, Norwegian University of Science and Technology, Postboks 8905, 7401 Trondheim, Norway,
                9C3 Collaborating for Health, 7-14 Great Dover Street, London SE1 4YR,
                10MRC Medical Bioinformatics Centre, Leeds Institute of Health Sciences, University of Leeds, University of Leeds, Charles Thackrah Building,101 Clarendon Road, Leeds LS2 9LJ,
                11University of Cambridge, Wolfson College, Cambridge CB3 9BB, UK and
                12Centre of Genomics and Policy, McGill University, Montreal, QC H3A 0G1, Canada
                Author notes
                *To whom correspondence should be addressed.

                The authors wish it to be known that, in their opinion, the first three authors and last author should be regarded as Joint First Authors.

                Associate Editor: Jonathan Wren

                Article
                btv279
                10.1093/bioinformatics/btv279
                4595892
                26112289
                119e0b22-b5e4-46e8-827b-f3958fb170b4
                © The Author 2015. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 17 November 2014
                : 7 April 2015
                : 27 April 2015
                Page count
                Pages: 8
                Categories
                Review
                Structural Bioinformatics

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article