30
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Electronic health records (EHRs) contain detailed clinical data stored in proprietary formats with non-standard codes and structures. Participating in multi-site clinical research networks requires EHR data to be restructured and transformed into a common format and standard terminologies, and optimally linked to other data sources. The expertise and scalable solutions needed to transform data to conform to network requirements are beyond the scope of many health care organizations and there is a need for practical tools that lower the barriers of data contribution to clinical research networks.

          Methods

          We designed and implemented a health data transformation and loading approach, which we refer to as Dynamic ETL (Extraction, Transformation and Loading) (D-ETL), that automates part of the process through use of scalable, reusable and customizable code, while retaining manual aspects of the process that requires knowledge of complex coding syntax. This approach provides the flexibility required for the ETL of heterogeneous data, variations in semantic expertise, and transparency of transformation logic that are essential to implement ETL conventions across clinical research sharing networks. Processing workflows are directed by the ETL specifications guideline, developed by ETL designers with extensive knowledge of the structure and semantics of health data (i.e., “health data domain experts”) and target common data model.

          Results

          D-ETL was implemented to perform ETL operations that load data from various sources with different database schema structures into the Observational Medical Outcome Partnership (OMOP) common data model. The results showed that ETL rule composition methods and the D-ETL engine offer a scalable solution for health data transformation via automatic query generation to harmonize source datasets.

          Conclusions

          D-ETL supports a flexible and transparent process to transform and load health data into a target data model. This approach offers a solution that lowers technical barriers that prevent data partners from participating in research data networks, and therefore, promotes the advancement of comparative effectiveness research using secondary electronic health data.

          Electronic supplementary material

          The online version of this article (10.1186/s12911-017-0532-3) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references34

          • Record: found
          • Abstract: found
          • Article: not found

          Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2).

          Informatics for Integrating Biology and the Bedside (i2b2) is one of seven projects sponsored by the NIH Roadmap National Centers for Biomedical Computing (http://www.ncbcs.org). Its mission is to provide clinical investigators with the tools necessary to integrate medical record and clinical research data in the genomics age, a software suite to construct and integrate the modern clinical research chart. i2b2 software may be used by an enterprise's research community to find sets of interesting patients from electronic patient medical record data, while preserving patient privacy through a query tool interface. Project-specific mini-databases ("data marts") can be created from these sets to make highly detailed data available on these specific patients to the investigators on the i2b2 platform, as reviewed and restricted by the Institutional Review Board. The current version of this software has been released into the public domain and is available at the URL: http://www.i2b2.org/software.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Definition, structure, content, use and impacts of electronic health records: a review of the research literature.

            This paper reviews the research literature on electronic health record (EHR) systems. The aim is to find out (1) how electronic health records are defined, (2) how the structure of these records is described, (3) in what contexts EHRs are used, (4) who has access to EHRs, (5) which data components of the EHRs are used and studied, (6) what is the purpose of research in this field, (7) what methods of data collection have been used in the studies reviewed and (8) what are the results of these studies. A systematic review was carried out of the research dealing with the content of EHRs. A literature search was conducted on four electronic databases: Pubmed/Medline, Cinalh, Eval and Cochrane. The concept of EHR comprised a wide range of information systems, from files compiled in single departments to longitudinal collections of patient data. Only very few papers offered descriptions of the structure of EHRs or the terminologies used. EHRs were used in primary, secondary and tertiary care. Data were recorded in EHRs by different groups of health care professionals. Secretarial staff also recorded data from dictation or nurses' or physicians' manual notes. Some information was also recorded by patients themselves; this information is validated by physicians. It is important that the needs and requirements of different users are taken into account in the future development of information systems. Several data components were documented in EHRs: daily charting, medication administration, physical assessment, admission nursing note, nursing care plan, referral, present complaint (e.g. symptoms), past medical history, life style, physical examination, diagnoses, tests, procedures, treatment, medication, discharge, history, diaries, problems, findings and immunization. In the future it will be necessary to incorporate different kinds of standardized instruments, electronic interviews and nursing documentation systems in EHR systems. The aspects of information quality most often explored in the studies reviewed were the completeness and accuracy of different data components. It has been shown in several studies that the use of an information system was conducive to more complete and accurate documentation by health care professionals. The quality of information is particularly important in patient care, but EHRs also provide important information for secondary purposes, such as health policy planning. Studies focusing on the content of EHRs are needed, especially studies of nursing documentation or patient self-documentation. One future research area is to compare the documentation of different health care professionals with the core information about EHRs which has been determined in national health projects. The challenge for ongoing national health record projects around the world is to take into account all the different types of EHRs and the needs and requirements of different health care professionals and consumers in the development of EHRs. A further challenge is the use of international terminologies in order to achieve semantic interoperability.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Launching PCORnet, a national patient-centered clinical research network

              The Patient-Centered Outcomes Research Institute (PCORI) has launched PCORnet, a major initiative to support an effective, sustainable national research infrastructure that will advance the use of electronic health data in comparative effectiveness research (CER) and other types of research. In December 2013, PCORI's board of governors funded 11 clinical data research networks (CDRNs) and 18 patient-powered research networks (PPRNs) for a period of 18 months. CDRNs are based on the electronic health records and other electronic sources of very large populations receiving healthcare within integrated or networked delivery systems. PPRNs are built primarily by communities of motivated patients, forming partnerships with researchers. These patients intend to participate in clinical research, by generating questions, sharing data, volunteering for interventional trials, and interpreting and disseminating results. Rapidly building a new national resource to facilitate a large-scale, patient-centered CER is associated with a number of technical, regulatory, and organizational challenges, which are described here.
                Bookmark

                Author and article information

                Contributors
                +1 303-724-9919 , Toan.Ong@ucdenver.edu
                Michael.kahn@ucdenver.edu
                bethany.kwan@ucdenver.edu
                traci.yamashita@ucdenver.edu
                elias.brandt@dartnet.info
                patrick.hosokawa@ucdenver.edu
                cuhrich@osrdata.com
                lisa.schilling@ucdenver.edu
                Journal
                BMC Med Inform Decis Mak
                BMC Med Inform Decis Mak
                BMC Medical Informatics and Decision Making
                BioMed Central (London )
                1472-6947
                13 September 2017
                13 September 2017
                2017
                : 17
                : 134
                Affiliations
                [1 ]ISNI 0000 0001 0703 675X, GRID grid.430503.1, Departments of Pediatrics, , University of Colorado Anschutz Medical Campus, School of Medicine, ; Building AO1 Room L15-1414, 12631 East 17th Avenue, Mail Stop F563, Aurora, CO 80045 USA
                [2 ]ISNI 0000 0001 0703 675X, GRID grid.430503.1, Departments of Family Medicine, , University of Colorado Anschutz Medical Campus, School of Medicine, ; Aurora, CO USA
                [3 ]ISNI 0000 0001 0703 675X, GRID grid.430503.1, Departments of Medicine, , University of Colorado Anschutz Medical Campus, School of Medicine, ; Aurora, CO USA
                [4 ]ISNI 0000 0001 0703 675X, GRID grid.430503.1, Colorado Clinical and Translational Sciences Institute, University of Colorado Anschutz Medical Campus, School of Medicine, ; Aurora, CO USA
                [5 ]DARTNet Institute, Aurora, CO USA
                [6 ]OSR Data Corporation, Lincoln, MA USA
                Article
                532
                10.1186/s12911-017-0532-3
                5598056
                28903729
                f1deae03-2135-4e75-9ade-99f9a053420d
                © The Author(s). 2017

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 26 May 2016
                : 31 August 2017
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100000133, Agency for Healthcare Research and Quality;
                Award ID: 1R01HS019908
                Award ID: 1R01HS022956
                Award Recipient :
                Categories
                Technical Advance
                Custom metadata
                © The Author(s) 2017

                Bioinformatics & Computational biology
                electronic health records,extraction,transformation and loading,distributed research networks,data harmonization,rule-based etl

                Comments

                Comment on this article