5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Objective

          Our objective is to create a source of synthetic electronic health records that is readily available; suited to industrial, innovation, research, and educational uses; and free of legal, privacy, security, and intellectual property restrictions.

          Materials and Methods

          We developed Synthea, an open-source software package that simulates the lifespans of synthetic patients, modeling the 10 most frequent reasons for primary care encounters and the 10 chronic conditions with the highest morbidity in the United States.

          Results

          Synthea adheres to a previously developed conceptual framework, scales via open-source deployment on the Internet, and may be extended with additional disease and treatment modules developed by its user community. One million synthetic patient records are now freely available online, encoded in standard formats (eg, Health Level-7 [HL7] Fast Healthcare Interoperability Resources [FHIR] and Consolidated-Clinical Document Architecture), and accessible through an HL7 FHIR application program interface.

          Discussion

          Health care lags other industries in information technology, data exchange, and interoperability. The lack of freely distributable health records has long hindered innovation in health care. Approaches and tools are available to inexpensively generate synthetic health records at scale without accidental disclosure risk, lowering current barriers to entry for promising early-stage developments. By engaging a growing community of users, the synthetic data generated will become increasingly comprehensive, detailed, and realistic over time.

          Conclusion

          Synthetic patients can be simulated with models of disease progression and corresponding standards of care to produce risk-free realistic synthetic health care records at scale.

          Related collections

          Most cited references34

          • Record: found
          • Abstract: found
          • Article: not found

          Identifying personal genomes by surname inference.

          Sharing sequencing data sets without identifiers has become a common practice in genomics. Here, we report that surnames can be recovered from personal genomes by profiling short tandem repeats on the Y chromosome (Y-STRs) and querying recreational genetic genealogy databases. We show that a combination of a surname with other types of metadata, such as age and state, can be used to triangulate the identity of the target. A key feature of this technique is that it entirely relies on free, publicly accessible Internet resources. We quantitatively analyze the probability of identification for U.S. males. We further demonstrate the feasibility of this technique by tracing back with high probability the identities of multiple participants in public sequencing projects.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            A Systematic Review of Re-Identification Attacks on Health Data

            Background Privacy legislation in most jurisdictions allows the disclosure of health data for secondary purposes without patient consent if it is de-identified. Some recent articles in the medical, legal, and computer science literature have argued that de-identification methods do not provide sufficient protection because they are easy to reverse. Should this be the case, it would have significant and important implications on how health information is disclosed, including: (a) potentially limiting its availability for secondary purposes such as research, and (b) resulting in more identifiable health information being disclosed. Our objectives in this systematic review were to: (a) characterize known re-identification attacks on health data and contrast that to re-identification attacks on other kinds of data, (b) compute the overall proportion of records that have been correctly re-identified in these attacks, and (c) assess whether these demonstrate weaknesses in current de-identification methods. Methods and Findings Searches were conducted in IEEE Xplore, ACM Digital Library, and PubMed. After screening, fourteen eligible articles representing distinct attacks were identified. On average, approximately a quarter of the records were re-identified across all studies (0.26 with 95% CI 0.046–0.478) and 0.34 for attacks on health data (95% CI 0–0.744). There was considerable uncertainty around the proportions as evidenced by the wide confidence intervals, and the mean proportion of records re-identified was sensitive to unpublished studies. Two of fourteen attacks were performed with data that was de-identified using existing standards. Only one of these attacks was on health data, which resulted in a success rate of 0.00013. Conclusions The current evidence shows a high re-identification rate but is dominated by small-scale studies on data that was not de-identified according to existing standards. This evidence is insufficient to draw conclusions about the efficacy of de-identification methods.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              A review of feature selection methods on synthetic data

                Bookmark

                Author and article information

                Journal
                J Am Med Inform Assoc
                J Am Med Inform Assoc
                jamia
                Journal of the American Medical Informatics Association : JAMIA
                Oxford University Press
                1067-5027
                1527-974X
                March 2018
                30 August 2017
                30 August 2017
                : 25
                : 3
                : 230-238
                Affiliations
                [1 ]The MITRE Corporation, Bedford, MA, USA
                [2 ]HIKER Group, Massey University, Palmerston North, New Zealand,
                [3 ]Department of Applied Computing and Engineering Technology, University of Montana, Missoula, MT, USA
                Author notes
                Corresponding Author: Jason Walonoski, 202 Burlington Road, Bedford, MA 01730, USA. E-mail: jwalonoski@ 123456mitre.org . Phone: +1-781-271-2021
                Article
                ocx079
                10.1093/jamia/ocx079
                7651916
                29025144
                3462ec3b-a867-44b5-8b16-eac214cbabee
                © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                History
                : 10 May 2017
                : 15 June 2017
                : 5 July 2017
                Page count
                Pages: 9
                Categories
                Research and Applications

                Bioinformatics & Computational biology
                electronic health records,computer simulation,patient-specific modeling,clinical pathways,rs-ehr

                Comments

                Comment on this article