3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Probabilistic record linkage of de-identified research datasets with discrepancies using diagnosis codes

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We develop an algorithm for probabilistic linkage of de-identified research datasets at the patient level, when only diagnosis codes with discrepancies and no personal health identifiers such as name or date of birth are available. It relies on Bayesian modelling of binarized diagnosis codes, and provides a posterior probability of matching for each patient pair, while considering all the data at once. Both in our simulation study (using an administrative claims dataset for data generation) and in two real use-cases linking patient electronic health records from a large tertiary care network, our method exhibits good performance and compares favourably to the standard baseline Fellegi-Sunter algorithm. We propose a scalable, fast and efficient open-source implementation in the ludic R package available on CRAN, which also includes the anonymized diagnosis code data from our real use-case. This work suggests it is possible to link de-identified research databases stripped of any personal health identifiers using only diagnosis codes, provided sufficient information is shared between the data sources.

          Related collections

          Most cited references26

          • Record: found
          • Abstract: found
          • Article: not found

          Publishing data from electronic health records while preserving privacy: a survey of algorithms.

          The dissemination of Electronic Health Records (EHRs) can be highly beneficial for a range of medical studies, spanning from clinical trials to epidemic control studies, but it must be performed in a way that preserves patients' privacy. This is not straightforward, because the disseminated data need to be protected against several privacy threats, while remaining useful for subsequent analysis tasks. In this work, we present a survey of algorithms that have been proposed for publishing structured patient data, in a privacy-preserving way. We review more than 45 algorithms, derive insights on their operation, and highlight their advantages and disadvantages. We also provide a discussion of some promising directions for future research in this area.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Design and implementation of a privacy preserving electronic health record linkage tool in Chicago.

            To design and implement a tool that creates a secure, privacy preserving linkage of electronic health record (EHR) data across multiple sites in a large metropolitan area in the United States (Chicago, IL), for use in clinical research.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              EXPLORING CONDITIONS FOR THE OPTIMALITY OF NAÏVE BAYES

                Bookmark

                Author and article information

                Journal
                Sci Data
                Sci Data
                Scientific Data
                Nature Publishing Group
                2052-4463
                08 January 2019
                2019
                : 6
                : 180298
                Affiliations
                [1 ]Department of Biostatistics, Harvard T.H. Chan School of Public Health , Boston, MA, USA
                [2 ]University Bordeaux, ISPED, Inserm Bordeaux Population Health Research Center, UMR 1219, Inria SISTM , Bordeaux F-33000, France
                [3 ]Department of Biomedical Informatics, Harvard Medical School , Boston, MA, USA
                [4 ]Division of Rheumatology, Immunology, and Allergy, Brigham and Women’s Hospital , Boston, MA, USA
                [5 ]Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology , Cambridge, MA, USA
                [6 ]Department of Neurology, Massachusetts General Hospital , Boston, MA, USA
                [7 ]Research IS and Computing, Partners HealthCare , Charlestown, MA, USA
                Author notes
                []

                B.P.H. developed and implemented the linkage method, analysed the data, drafted the manuscript and approved its final version. G.M.W. contributed to the method development, to the analysis of the data, drafted the manuscript and approved its final version. K.P.L. contributed to the analysis of the data, drafted the manuscript and approved its final version. N.P.P. provided the claims data for the simulation study and approved the final version of the manuscript. S.C. contributed to drafting the manuscript and approved its final version. N.A.S. provided the BRASS registry cohort data and approved the final version of the manuscript. P.S. contributed to the method development, to drafting the manuscript and approved its final version. S.N.M. contributed to the method development, to drafting the manuscript and approved its final version. I.S.K. contributed to the method development, to the drafting the manuscript and approved its final version. T.C. developed the linkage method, contributed to the analysis of the data, drafted the manuscript and approved its final version.

                Author information
                http://orcid.org/0000-0001-8411-6403
                Article
                sdata2018298
                10.1038/sdata.2018.298
                6326114
                30620344
                5ff7895c-06c4-4f6e-9fd9-0b8311718f8c
                Copyright © 2019, The Author(s)

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

                History
                : 29 January 2018
                : 26 November 2018
                Categories
                Article

                medical research,diagnosis
                medical research, diagnosis

                Comments

                Comment on this article