28
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Extracting a stroke phenotype risk factor from Veteran Health Administration clinical reports: an information content analysis

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          In the United States, 795,000 people suffer strokes each year; 10–15 % of these strokes can be attributed to stenosis caused by plaque in the carotid artery, a major stroke phenotype risk factor. Studies comparing treatments for the management of asymptomatic carotid stenosis are challenging for at least two reasons: 1) administrative billing codes (i.e., Current Procedural Terminology (CPT) codes) that identify carotid images do not denote which neurovascular arteries are affected and 2) the majority of the image reports are negative for carotid stenosis. Studies that rely on manual chart abstraction can be labor-intensive, expensive, and time-consuming. Natural Language Processing (NLP) can expedite the process of manual chart abstraction by automatically filtering reports with no/insignificant carotid stenosis findings and flagging reports with significant carotid stenosis findings; thus, potentially reducing effort, costs, and time.

          Methods

          In this pilot study, we conducted an information content analysis of carotid stenosis mentions in terms of their report location (Sections), report formats ( structures) and linguistic descriptions ( expressions) from Veteran Health Administration free-text reports. We assessed an NLP algorithm, pyConText’s, ability to discern reports with significant carotid stenosis findings from reports with no/insignificant carotid stenosis findings given these three document composition factors for two report types: radiology (RAD) and text integration utility (TIU) notes.

          Results

          We observed that most carotid mentions are recorded in prose using categorical expressions , within the Findings and Impression sections for RAD reports and within neither of these designated sections for TIU notes. For RAD reports, pyConText performed with high sensitivity (88 %), specificity (84 %), and negative predictive value (95 %) and reasonable positive predictive value (70 %). For TIU notes, pyConText performed with high specificity (87 %) and negative predictive value (92 %), reasonable sensitivity (73 %), and moderate positive predictive value (58 %). pyConText performed with the highest sensitivity processing the full report rather than the Findings or Impressions independently.

          Conclusion

          We conclude that pyConText can reduce chart review efforts by filtering reports with no/insignificant carotid stenosis findings and flagging reports with significant carotid stenosis findings from the Veteran Health Administration electronic health record, and hence has utility for expediting a comparative effectiveness study of treatment strategies for stroke prevention.

          Related collections

          Most cited references22

          • Record: found
          • Abstract: found
          • Article: not found

          A simple algorithm for identifying negated findings and diseases in discharge summaries.

          Narrative reports in medical records contain a wealth of information that may augment structured data for managing patient information and predicting trends in diseases. Pertinent negatives are evident in text but are not usually indexed in structured databases. The objective of the study reported here was to test a simple algorithm for determining whether a finding or disease mentioned within narrative medical reports is present or absent. We developed a simple regular expression algorithm called NegEx that implements several phrases indicating negation, filters out sentences containing phrases that falsely appear to be negation phrases, and limits the scope of the negation phrases. We compared NegEx against a baseline algorithm that has a limited set of negation phrases and a simpler notion of scope. In a test of 1235 findings and diseases in 1000 sentences taken from discharge summaries indexed by physicians, NegEx had a specificity of 94.5% (versus 85.3% for the baseline), a positive predictive value of 84.5% (versus 68.4% for the baseline) while maintaining a reasonable sensitivity of 77.8% (versus 88.3% for the baseline). We conclude that with little implementation effort a simple regular expression algorithm for determining whether a finding or disease is absent can identify a large portion of the pertinent negatives from discharge summaries.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Accuracy of ICD-9-CM codes for identifying cardiovascular and stroke risk factors.

            We sought to determine which ICD-9-CM codes in Medicare Part A data identify cardiovascular and stroke risk factors. This was a cross-sectional study comparing ICD-9-CM data to structured medical record review from 23,657 Medicare beneficiaries aged 20 to 105 years who had atrial fibrillation. Quality improvement organizations used standardized abstraction instruments to determine the presence of 9 cardiovascular and stroke risk factors. Using the chart abstractions as the gold standard, we assessed the accuracy of ICD-9-CM codes to identify these risk factors. ICD-9-CM codes for all risk factors had high specificity (>0.95) and low sensitivity ( or =0.98) but moderate positive predictive values (range, 0.54-0.77) in this population. Using ICD-9-CM codes alone, heart failure, coronary artery disease, diabetes, hypertension, and stroke can be ruled in but not necessarily ruled out. Where feasible, review of additional data (eg, physician notes or imaging studies) should be used to confirm the diagnosis of valvular disease, arterial peripheral embolus, intracranial hemorrhage, and deep venous thrombosis.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              A review of approaches to identifying patient phenotype cohorts using electronic health records

              Objective To summarize literature describing approaches aimed at automatically identifying patients with a common phenotype. Materials and methods We performed a review of studies describing systems or reporting techniques developed for identifying cohorts of patients with specific phenotypes. Every full text article published in (1) Journal of American Medical Informatics Association, (2) Journal of Biomedical Informatics, (3) Proceedings of the Annual American Medical Informatics Association Symposium, and (4) Proceedings of Clinical Research Informatics Conference within the past 3 years was assessed for inclusion in the review. Only articles using automated techniques were included. Results Ninety-seven articles met our inclusion criteria. Forty-six used natural language processing (NLP)-based techniques, 24 described rule-based systems, 41 used statistical analyses, data mining, or machine learning techniques, while 22 described hybrid systems. Nine articles described the architecture of large-scale systems developed for determining cohort eligibility of patients. Discussion We observe that there is a rise in the number of studies associated with cohort identification using electronic medical records. Statistical analyses or machine learning, followed by NLP techniques, are gaining popularity over the years in comparison with rule-based systems. Conclusions There are a variety of approaches for classifying patients into a particular phenotype. Different techniques and data sources are used, and good performance is reported on datasets at respective institutions. However, no system makes comprehensive use of electronic medical records addressing all of their known weaknesses.
                Bookmark

                Author and article information

                Contributors
                danielle.mowery@utah.edu
                Journal
                J Biomed Semantics
                J Biomed Semantics
                Journal of Biomedical Semantics
                BioMed Central (London )
                2041-1480
                10 May 2016
                10 May 2016
                2016
                : 7
                : 26
                Affiliations
                [ ]Department of Biomedical Informatics, University of Utah, Salt Lake City, UT USA
                [ ]IDEAS Center, Veteran Affair Health Care System, Salt Lake City, UT USA
                [ ]San Francisco Veteran Affair Health Care System, San Francisco, CA USA
                Article
                65
                10.1186/s13326-016-0065-1
                4863379
                27175226
                475fbfbb-73fa-4fa2-97f4-079569e04286
                © Mowery et al. 2016

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 21 October 2015
                : 19 April 2016
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100000057, National Institute of General Medical Sciences;
                Award ID: NIGMS R01GM090187
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000092, U.S. National Library of Medicine;
                Award ID: R01 LM010964
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000050, National Heart, Lung, and Blood Institute;
                Award ID: 1R01HL114563-01A1
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100007181, Quality Enhancement Research Initiative;
                Award ID: HSR&D Stroke QUERI RRP 12-185
                Award Recipient :
                Categories
                Research
                Custom metadata
                © The Author(s) 2016

                Bioinformatics & Computational biology
                natural language processing,stroke,phenotype,information extraction

                Comments

                Comment on this article