1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Multiple imputation for analysis of incomplete data in distributed health data networks

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Distributed health data networks (DHDNs) leverage data from multiple sources or sites such as electronic health records (EHRs) from multiple healthcare systems and have drawn increasing interests in recent years, as they do not require sharing of subject-level data and hence lower the hurdles for collaboration between institutions considerably. However, DHDNs face a number of challenges in data analysis, particularly in the presence of missing data. The current state-of-the-art methods for handling incomplete data require pooling data into a central repository before analysis, which is not feasible in DHDNs. In this paper, we address the missing data problem in distributed environments such as DHDNs that has not been investigated previously. We develop communication-efficient distributed multiple imputation methods for incomplete data that are horizontally partitioned. Since subject-level data are not shared or transferred outside of each site in the proposed methods, they enhance protection of patient privacy and have the potential to strengthen public trust in analysis of sensitive health data. We investigate, through extensive simulation studies, the performance of these methods. Our methods are applied to the analysis of an acute stroke dataset collected from multiple hospitals, mimicking a DHDN where health data are horizontally partitioned across hospitals and subject-level data cannot be shared or sent to a central data repository.

          Abstract

          Distributed health data networks (DHDNs) leverage data from multiple healthcare systems, but often face major analytical challenges in the presence of missing data. This paper develops distributed multiple imputation methods that do not require sharing subject-level data across health systems.

          Related collections

          Most cited references20

          • Record: found
          • Abstract: not found
          • Article: not found

          mice: Multivariate Imputation by Chained Equations inR

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Identifying personal genomes by surname inference.

            Sharing sequencing data sets without identifiers has become a common practice in genomics. Here, we report that surnames can be recovered from personal genomes by profiling short tandem repeats on the Y chromosome (Y-STRs) and querying recreational genetic genealogy databases. We show that a combination of a surname with other types of metadata, such as age and state, can be used to triangulate the identity of the target. A key feature of this technique is that it entirely relies on free, publicly accessible Internet resources. We quantitatively analyze the probability of identification for U.S. males. We further demonstrate the feasibility of this technique by tracing back with high probability the identities of multiple participants in public sequencing projects.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Strategies for Handling Missing Data in Electronic Health Record Derived Data

              Electronic health records (EHRs) present a wealth of data that are vital for improving patient-centered outcomes, although the data can present significant statistical challenges. In particular, EHR data contains substantial missing information that if left unaddressed could reduce the validity of conclusions drawn. Properly addressing the missing data issue in EHR data is complicated by the fact that it is sometimes difficult to differentiate between missing data and a negative value. For example, a patient without a documented history of heart failure may truly not have disease or the clinician may have simply not documented the condition. Approaches for reducing missing data in EHR systems come from multiple angles, including: increasing structured data documentation, reducing data input errors, and utilization of text parsing / natural language processing. This paper focuses on the analytical approaches for handling missing data, primarily multiple imputation. The broad range of variables available in typical EHR systems provide a wealth of information for mitigating potential biases caused by missing data. The probability of missing data may be linked to disease severity and healthcare utilization since unhealthier patients are more likely to have comorbidities and each interaction with the health care system provides an opportunity for documentation. Therefore, any imputation routine should include predictor variables that assess overall health status (e.g. Charlson Comorbidity Index) and healthcare utilization (e.g. number of encounters) even when these comorbidities and patient encounters are unrelated to the disease of interest. Linking the EHR data with other sources of information (e.g. National Death Index and census data) can also provide less biased variables for imputation. Additional methodological research with EHR data and improved epidemiological training of clinical investigators is warranted.
                Bookmark

                Author and article information

                Contributors
                qlong@upenn.edu
                Journal
                Nat Commun
                Nat Commun
                Nature Communications
                Nature Publishing Group UK (London )
                2041-1723
                29 October 2020
                29 October 2020
                2020
                : 11
                : 5467
                Affiliations
                [1 ]GRID grid.25879.31, ISNI 0000 0004 1936 8972, University of Pennsylvania, ; Philadelphia, PA USA
                [2 ]GRID grid.189967.8, ISNI 0000 0001 0941 6502, Emory University, ; Atlanta, GA USA
                [3 ]GRID grid.267308.8, ISNI 0000 0000 9206 2401, University of Texas Health Science Center at Houston, ; Houston, TX USA
                Author information
                http://orcid.org/0000-0003-3426-1295
                http://orcid.org/0000-0001-9933-2205
                http://orcid.org/0000-0003-0660-5230
                Article
                19270
                10.1038/s41467-020-19270-2
                7596726
                33122624
                d41cd9ca-0c1a-4bb4-bd50-1522e97848f1
                © The Author(s) 2020

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 15 April 2020
                : 2 October 2020
                Funding
                Funded by: FundRef https://doi.org/10.13039/100000057, U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS);
                Award ID: R01GM124111
                Award ID: R01GM124111
                Award ID: R01GM124111
                Award Recipient :
                Funded by: U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS)
                Funded by: U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS)
                Categories
                Article
                Custom metadata
                © The Author(s) 2020

                Uncategorized
                computational science,statistics
                Uncategorized
                computational science, statistics

                Comments

                Comment on this article