19
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A hybrid solution for extracting structured medical information from unstructured data in medical records via a double-reading/entry system

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Healthcare providers generate a huge amount of biomedical data stored in either legacy system (paper-based) format or electronic medical records (EMR) around the world, which are collectively referred to as big biomedical data (BBD). To realize the promise of BBD for clinical use and research, it is an essential step to extract key data elements from unstructured medical records into patient-centered electronic health records with computable data elements. Our objective is to introduce a novel solution, known as a double-reading/entry system (DRESS), for extracting clinical data from unstructured medical records (MR) and creating a semi-structured electronic health record database, as well as to demonstrate its reproducibility empirically.

          Methods

          Utilizing the modern cloud-based technologies, we have developed a comprehensive system that includes multiple subsystems, from capturing MRs in clinics, to securely transferring MRs, storing and managing cloud-based MRs, to facilitating both machine learning and manual reading, and to performing iterative quality control before committing the semi-structured data into the desired database. To evaluate the reproducibility of extracted medical data elements by DRESS, we conduct a blinded reproducibility study, with 100 MRs from patients who have undergone surgical treatment of lung cancer in China. The study uses Kappa statistic to measure concordance of discrete variables, and uses correlation coefficient to measure reproducibility of continuous variables.

          Results

          Using the DRESS, we have demonstrated the feasibility of extracting clinical data from unstructured MRs to create semi-structured and patient-centered electronic health record database. The reproducibility study with 100 patient’s MRs has shown an overall high reproducibility of 98 %, and varies across six modules (pathology, Radio/chemo therapy, clinical examination, surgery information, medical image and general patient information).

          Conclusions

          DRESS uses a double-reading, double-entry, and an independent adjudication, to manually curate structured data elements from unstructured clinical data. Further, through distributed computing strategies, DRESS protects data privacy by dividing MR data into de-identified modules. Finally, through internet-based computing cloud, DRESS enables many data specialists to work in a virtual environment to achieve the necessary scale of processing thousands MRs within days. This hybrid system represents probably a workable solution to solve the big medical data challenge.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s12911-016-0357-5) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references18

          • Record: found
          • Abstract: not found
          • Article: not found

          Biology: The big challenges of big data.

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Definition, structure, content, use and impacts of electronic health records: a review of the research literature.

            This paper reviews the research literature on electronic health record (EHR) systems. The aim is to find out (1) how electronic health records are defined, (2) how the structure of these records is described, (3) in what contexts EHRs are used, (4) who has access to EHRs, (5) which data components of the EHRs are used and studied, (6) what is the purpose of research in this field, (7) what methods of data collection have been used in the studies reviewed and (8) what are the results of these studies. A systematic review was carried out of the research dealing with the content of EHRs. A literature search was conducted on four electronic databases: Pubmed/Medline, Cinalh, Eval and Cochrane. The concept of EHR comprised a wide range of information systems, from files compiled in single departments to longitudinal collections of patient data. Only very few papers offered descriptions of the structure of EHRs or the terminologies used. EHRs were used in primary, secondary and tertiary care. Data were recorded in EHRs by different groups of health care professionals. Secretarial staff also recorded data from dictation or nurses' or physicians' manual notes. Some information was also recorded by patients themselves; this information is validated by physicians. It is important that the needs and requirements of different users are taken into account in the future development of information systems. Several data components were documented in EHRs: daily charting, medication administration, physical assessment, admission nursing note, nursing care plan, referral, present complaint (e.g. symptoms), past medical history, life style, physical examination, diagnoses, tests, procedures, treatment, medication, discharge, history, diaries, problems, findings and immunization. In the future it will be necessary to incorporate different kinds of standardized instruments, electronic interviews and nursing documentation systems in EHR systems. The aspects of information quality most often explored in the studies reviewed were the completeness and accuracy of different data components. It has been shown in several studies that the use of an information system was conducive to more complete and accurate documentation by health care professionals. The quality of information is particularly important in patient care, but EHRs also provide important information for secondary purposes, such as health policy planning. Studies focusing on the content of EHRs are needed, especially studies of nursing documentation or patient self-documentation. One future research area is to compare the documentation of different health care professionals with the core information about EHRs which has been determined in national health projects. The challenge for ongoing national health record projects around the world is to take into account all the different types of EHRs and the needs and requirements of different health care professionals and consumers in the development of EHRs. A further challenge is the use of international terminologies in order to achieve semantic interoperability.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found

              Electronic health records: new opportunities for clinical research.

              Clinical research is on the threshold of a new era in which electronic health records (EHRs) are gaining an important novel supporting role. Whilst EHRs used for routine clinical care have some limitations at present, as discussed in this review, new improved systems and emerging research infrastructures are being developed to ensure that EHRs can be used for secondary purposes such as clinical research, including the design and execution of clinical trials for new medicines. EHR systems should be able to exchange information through the use of recently published international standards for their interoperability and clinically validated information structures (such as archetypes and international health terminologies), to ensure consistent and more complete recording and sharing of data for various patient groups. Such systems will counteract the obstacles of differing clinical languages and styles of documentation as well as the recognized incompleteness of routine records. Here, we discuss some of the legal and ethical concerns of clinical research data reuse and technical security measures that can enable such research while protecting privacy. In the emerging research landscape, cooperation infrastructures are being built where research projects can utilize the availability of patient data from federated EHR systems from many different sites, as well as in international multilingual settings. Amongst several initiatives described, the EHR4CR project offers a promising method for clinical research. One of the first achievements of this project was the development of a protocol feasibility prototype which is used for finding patients eligible for clinical trials from multiple sources. © 2013 The Association for the Publication of the Journal of Internal Medicine.
                Bookmark

                Author and article information

                Contributors
                ligangluo@linkdoc.com
                lily@linkdoc.com
                hujiajia@linkdoc.com
                wxz@linkdoc.com
                tony@linkdoc.com
                lzhao@fhcrc.org
                Journal
                BMC Med Inform Decis Mak
                BMC Med Inform Decis Mak
                BMC Medical Informatics and Decision Making
                BioMed Central (London )
                1472-6947
                30 August 2016
                30 August 2016
                2016
                : 16
                : 1
                : 114
                Affiliations
                [1 ]LinkDoc Inc, 8 Haidian Street, Block A, 8th Floor, Haidian District, Beijing, China
                [2 ]Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA USA
                Article
                357
                10.1186/s12911-016-0357-5
                5006527
                27577240
                2a6a7a9b-d921-4058-8cda-c2c4318dc375
                © The Author(s). 2016

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 21 February 2016
                : 23 August 2016
                Categories
                Technical Advance
                Custom metadata
                © The Author(s) 2016

                Bioinformatics & Computational biology
                big data,big medical data,clinical research,clinical decision support system,cloud-based system,double data entry,electronic medical record,health service research,structured data and unstructured data

                Comments

                Comment on this article