+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The eICU Collaborative Research Database, a freely available multi-center database for critical care research


      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          Critical care patients are monitored closely through the course of their illness. As a result of this monitoring, large amounts of data are routinely collected for these patients. Philips Healthcare has developed a telehealth system, the eICU Program, which leverages these data to support management of critically ill patients. Here we describe the eICU Collaborative Research Database, a multi-center intensive care unit (ICU)database with high granularity data for over 200,000 admissions to ICUs monitored by eICU Programs across the United States. The database is deidentified, and includes vital sign measurements, care plan documentation, severity of illness measures, diagnosis information, treatment information, and more. Data are publicly available after registration, including completion of a training course in research with human subjects and signing of a data use agreement mandating responsible handling of the data and adhering to the principle of collaborative research. The freely available nature of the data will support a number of applications including the development of machine learning algorithms, decision support tools, and clinical research.

          Related collections

          Most cited references14

          • Record: found
          • Abstract: found
          • Article: not found

          Multiparameter Intelligent Monitoring in Intensive Care II: a public-access intensive care unit database.

          We sought to develop an intensive care unit research database applying automated techniques to aggregate high-resolution diagnostic and therapeutic data from a large, diverse population of adult intensive care unit patients. This freely available database is intended to support epidemiologic research in critical care medicine and serve as a resource to evaluate new clinical decision support and monitoring algorithms. Data collection and retrospective analysis. All adult intensive care units (medical intensive care unit, surgical intensive care unit, cardiac care unit, cardiac surgery recovery unit) at a tertiary care hospital. Adult patients admitted to intensive care units between 2001 and 2007. None. The Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) database consists of 25,328 intensive care unit stays. The investigators collected detailed information about intensive care unit patient stays, including laboratory data, therapeutic intervention profiles such as vasoactive medication drip rates and ventilator settings, nursing progress notes, discharge summaries, radiology reports, provider order entry data, International Classification of Diseases, 9th Revision codes, and, for a subset of patients, high-resolution vital sign trends and waveforms. Data were automatically deidentified to comply with Health Insurance Portability and Accountability Act standards and integrated with relational database software to create electronic intensive care unit records for each patient stay. The data were made freely available in February 2010 through the Internet along with a detailed user's guide and an assortment of data processing tools. The overall hospital mortality rate was 11.7%, which varied by critical care unit. The median intensive care unit length of stay was 2.2 days (interquartile range, 1.1-4.4 days). According to the primary International Classification of Diseases, 9th Revision codes, the following disease categories each comprised at least 5% of the case records: diseases of the circulatory system (39.1%); trauma (10.2%); diseases of the digestive system (9.7%); pulmonary diseases (9.0%); infectious diseases (7.0%); and neoplasms (6.8%). MIMIC-II documents a diverse and very large population of intensive care unit patient stays and contains comprehensive and detailed clinical data, including physiological waveforms and minute-by-minute trends for a subset of records. It establishes a new public-access resource for critical care research, supporting a diverse range of analytic studies spanning epidemiology, clinical decision-rule development, and electronic tool development.
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Machine Learning and Decision Support in Critical Care

            Clinical data management systems typically provide caregiver teams with useful information, derived from large, sometimes highly heterogeneous, data sources that are often changing dynamically. Over the last decade there has been a significant surge in interest in using these data sources, from simply re-using the standard clinical databases for event prediction or decision support, to including dynamic and patient-specific information into clinical monitoring and prediction problems. However, in most cases, commercial clinical databases have been designed to document clinical activity for reporting, liability and billing reasons, rather than for developing new algorithms. With increasing excitement surrounding “secondary use of medical records” and “Big Data” analytics, it is important to understand the limitations of current databases and what needs to change in order to enter an era of “precision medicine.” This review article covers many of the issues involved in the collection and preprocessing of critical care data. The three challenges in critical care are considered: compartmentalization, corruption, and complexity. A range of applications addressing these issues are covered, including the modernization of static acuity scoring; on-line patient tracking; personalized prediction and risk assessment; artifact detection; state estimation; and incorporation of multimodal data sources such as genomic and free text data.
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Automated de-identification of free-text medical records

              Background Text-based patient medical records are a vital resource in medical research. In order to preserve patient confidentiality, however, the U.S. Health Insurance Portability and Accountability Act (HIPAA) requires that protected health information (PHI) be removed from medical records before they can be disseminated. Manual de-identification of large medical record databases is prohibitively expensive, time-consuming and prone to error, necessitating automatic methods for large-scale, automated de-identification. Methods We describe an automated Perl-based de-identification software package that is generally usable on most free-text medical records, e.g., nursing notes, discharge summaries, X-ray reports, etc. The software uses lexical look-up tables, regular expressions, and simple heuristics to locate both HIPAA PHI, and an extended PHI set that includes doctors' names and years of dates. To develop the de-identification approach, we assembled a gold standard corpus of re-identified nursing notes with real PHI replaced by realistic surrogate information. This corpus consists of 2,434 nursing notes containing 334,000 words and a total of 1,779 instances of PHI taken from 163 randomly selected patient records. This gold standard corpus was used to refine the algorithm and measure its sensitivity. To test the algorithm on data not used in its development, we constructed a second test corpus of 1,836 nursing notes containing 296,400 words. The algorithm's false negative rate was evaluated using this test corpus. Results Performance evaluation of the de-identification software on the development corpus yielded an overall recall of 0.967, precision value of 0.749, and fallout value of approximately 0.002. On the test corpus, a total of 90 instances of false negatives were found, or 27 per 100,000 word count, with an estimated recall of 0.943. Only one full date and one age over 89 were missed. No patient names were missed in either corpus. Conclusion We have developed a pattern-matching de-identification system based on dictionary look-ups, regular expressions, and heuristics. Evaluation based on two different sets of nursing notes collected from a U.S. hospital suggests that, in terms of recall, the software out-performs a single human de-identifier (0.81) and performs at least as well as a consensus of two human de-identifiers (0.94). The system is currently tuned to de-identify PHI in nursing notes and discharge summaries but is sufficiently generalized and can be customized to handle text files of any format. Although the accuracy of the algorithm is high, it is probably insufficient to be used to publicly disseminate medical data. The open-source de-identification software and the gold standard re-identified corpus of medical records have therefore been made available to researchers via the PhysioNet website to encourage improvements in the algorithm.

                Author and article information

                Sci Data
                Sci Data
                Scientific Data
                Nature Publishing Group
                11 September 2018
                : 5
                : 180178
                [1 ]Institute for Medical Engineering and Science, Massachusetts Institute of Technology , Cambridge, MA 02139, USA
                [2 ]Beth Israel Deaconess Medical Center , Boston, MA 02215, USA
                [3 ]Department of eICU Research and Development, Philips Healthcare , Baltimore, MD 21202, USA
                [4 ]Department of Pharmacy Practice and Science, University of Maryland, School of Pharmacy , Baltimore, MD 21201, USA
                Author notes
                [a ] A.E.W.J. (email: aewj@ 123456mit.edu ).

                These authors contributed equally to this work.


                A.E.W.J. and T.J.P. collaborated to publish the data and write the paper. J.D.R. performed sample selection, provided the documentation for the process, and collaborated on the paper. L.A.C., R.G.M., and O.B. reviewed the paper and supervised the work.

                Author information
                Copyright © 2018, The Author(s)

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files made available in this article.

                : 24 January 2018
                : 21 June 2018
                Data Descriptor

                databases,health care
                databases, health care


                Comment on this article