Extracting a stroke phenotype risk factor from Veteran Health Administration clinical reports: an information content analysis

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

In the United States, 795,000 people suffer strokes each year; 10–15 % of these strokes can be attributed to stenosis caused by plaque in the carotid artery, a major stroke phenotype risk factor. Studies comparing treatments for the management of asymptomatic carotid stenosis are challenging for at least two reasons: 1) administrative billing codes (i.e., Current Procedural Terminology (CPT) codes) that identify carotid images do not denote which neurovascular arteries are affected and 2) the majority of the image reports are negative for carotid stenosis. Studies that rely on manual chart abstraction can be labor-intensive, expensive, and time-consuming. Natural Language Processing (NLP) can expedite the process of manual chart abstraction by automatically filtering reports with no/insignificant carotid stenosis findings and flagging reports with significant carotid stenosis findings; thus, potentially reducing effort, costs, and time.

Methods

In this pilot study, we conducted an information content analysis of carotid stenosis mentions in terms of their report location (Sections), report formats ( structures) and linguistic descriptions ( expressions) from Veteran Health Administration free-text reports. We assessed an NLP algorithm, pyConText’s, ability to discern reports with significant carotid stenosis findings from reports with no/insignificant carotid stenosis findings given these three document composition factors for two report types: radiology (RAD) and text integration utility (TIU) notes.

Results

We observed that most carotid mentions are recorded in prose using categorical expressions , within the Findings and Impression sections for RAD reports and within neither of these designated sections for TIU notes. For RAD reports, pyConText performed with high sensitivity (88 %), specificity (84 %), and negative predictive value (95 %) and reasonable positive predictive value (70 %). For TIU notes, pyConText performed with high specificity (87 %) and negative predictive value (92 %), reasonable sensitivity (73 %), and moderate positive predictive value (58 %). pyConText performed with the highest sensitivity processing the full report rather than the Findings or Impressions independently.

Conclusion

We conclude that pyConText can reduce chart review efforts by filtering reports with no/insignificant carotid stenosis findings and flagging reports with significant carotid stenosis findings from the Veteran Health Administration electronic health record, and hence has utility for expediting a comparative effectiveness study of treatment strategies for stroke prevention.

Related collections

Most cited references 22

Record: found
Abstract: found
Article: not found

A simple algorithm for identifying negated findings and diseases in discharge summaries.

Wendy W. Chapman, Will Bridewell, Paul Hanbury … (2001)

Narrative reports in medical records contain a wealth of information that may augment structured data for managing patient information and predicting trends in diseases. Pertinent negatives are evident in text but are not usually indexed in structured databases. The objective of the study reported here was to test a simple algorithm for determining whether a finding or disease mentioned within narrative medical reports is present or absent. We developed a simple regular expression algorithm called NegEx that implements several phrases indicating negation, filters out sentences containing phrases that falsely appear to be negation phrases, and limits the scope of the negation phrases. We compared NegEx against a baseline algorithm that has a limited set of negation phrases and a simpler notion of scope. In a test of 1235 findings and diseases in 1000 sentences taken from discharge summaries indexed by physicians, NegEx had a specificity of 94.5% (versus 85.3% for the baseline), a positive predictive value of 84.5% (versus 68.4% for the baseline) while maintaining a reasonable sensitivity of 77.8% (versus 88.3% for the baseline). We conclude that with little implementation effort a simple regular expression algorithm for determining whether a finding or disease is absent can identify a large portion of the pertinent negatives from discharge summaries.

0 comments Cited 242 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Accuracy of ICD-9-CM codes for identifying cardiovascular and stroke risk factors.

Yan Yan, Brian F Gage, Yan Yan … (2005)

We sought to determine which ICD-9-CM codes in Medicare Part A data identify cardiovascular and stroke risk factors. This was a cross-sectional study comparing ICD-9-CM data to structured medical record review from 23,657 Medicare beneficiaries aged 20 to 105 years who had atrial fibrillation. Quality improvement organizations used standardized abstraction instruments to determine the presence of 9 cardiovascular and stroke risk factors. Using the chart abstractions as the gold standard, we assessed the accuracy of ICD-9-CM codes to identify these risk factors. ICD-9-CM codes for all risk factors had high specificity (>0.95) and low sensitivity ( or =0.98) but moderate positive predictive values (range, 0.54-0.77) in this population. Using ICD-9-CM codes alone, heart failure, coronary artery disease, diabetes, hypertension, and stroke can be ruled in but not necessarily ruled out. Where feasible, review of additional data (eg, physician notes or imaging studies) should be used to confirm the diagnosis of valvular disease, arterial peripheral embolus, intracranial hemorrhage, and deep venous thrombosis.

0 comments Cited 174 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

A review of approaches to identifying patient phenotype cohorts using electronic health records

Chaitanya Shivade, Preethi Raghavan, Eric Fosler-Lussier … (2013)

Objective To summarize literature describing approaches aimed at automatically identifying patients with a common phenotype. Materials and methods We performed a review of studies describing systems or reporting techniques developed for identifying cohorts of patients with specific phenotypes. Every full text article published in (1) Journal of American Medical Informatics Association, (2) Journal of Biomedical Informatics, (3) Proceedings of the Annual American Medical Informatics Association Symposium, and (4) Proceedings of Clinical Research Informatics Conference within the past 3 years was assessed for inclusion in the review. Only articles using automated techniques were included. Results Ninety-seven articles met our inclusion criteria. Forty-six used natural language processing (NLP)-based techniques, 24 described rule-based systems, 41 used statistical analyses, data mining, or machine learning techniques, while 22 described hybrid systems. Nine articles described the architecture of large-scale systems developed for determining cohort eligibility of patients. Discussion We observe that there is a rise in the number of studies associated with cohort identification using electronic medical records. Statistical analyses or machine learning, followed by NLP techniques, are gaining popularity over the years in comparison with rule-based systems. Conclusions There are a variety of approaches for classifying patients into a particular phenotype. Different techniques and data sources are used, and good performance is reported on datasets at respective institutions. However, no system makes comprehensive use of electronic medical records addressing all of their known weaknesses.

0 comments Cited 142 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Danielle L. Mowery: danielle.mowery@utah.edu

Journal

Journal ID (nlm-ta): J Biomed Semantics

Journal ID (iso-abbrev): J Biomed Semantics

Title: Journal of Biomedical Semantics

Publisher: BioMed Central (London )

ISSN (Electronic): 2041-1480

Publication date (Electronic): 10 May 2016

Publication date PMC-release: 10 May 2016

Publication date Collection: 2016

Volume: 7

Electronic Location Identifier: 26

Affiliations

[ ]Department of Biomedical Informatics, University of Utah, Salt Lake City, UT USA

[ ]IDEAS Center, Veteran Affair Health Care System, Salt Lake City, UT USA

[ ]San Francisco Veteran Affair Health Care System, San Francisco, CA USA

Article

Publisher ID: 65

DOI: 10.1186/s13326-016-0065-1

PMC ID: 4863379

PubMed ID: 27175226

SO-VID: 475fbfbb-73fa-4fa2-97f4-079569e04286

License:

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

History

Date received : 21 October 2015

Date accepted : 19 April 2016

Funding

Funded by: FundRef http://dx.doi.org/10.13039/100000057, National Institute of General Medical Sciences;

Award ID: NIGMS R01GM090187

Award Recipient : Wendy W. Chapman

Funded by: FundRef http://dx.doi.org/10.13039/100000092, U.S. National Library of Medicine;

Award ID: R01 LM010964

Award Recipient : Wendy W. Chapman

Funded by: FundRef http://dx.doi.org/10.13039/100000050, National Heart, Lung, and Blood Institute;

Award ID: 1R01HL114563-01A1

Award Recipient : Salomeh Keyhani

Funded by: FundRef http://dx.doi.org/10.13039/100007181, Quality Enhancement Research Initiative;

Award ID: HSR&D Stroke QUERI RRP 12-185

Award Recipient : Wendy W. Chapman

Custom metadata

ScienceOpen disciplines: Bioinformatics & Computational biology

Keywords: natural language processing,stroke,phenotype,information extraction

Data availability:

ScienceOpen disciplines: Bioinformatics & Computational biology

Keywords: natural language processing, stroke, phenotype, information extraction

Extracting a stroke phenotype risk factor from Veteran Health Administration clinical reports: an information content analysis

Read this article at

Abstract

Background

Methods

Results

Conclusion

Related collections

Radiology and Natural Language Processing

Most cited references 22

A simple algorithm for identifying negated findings and diseases in discharge summaries.

Accuracy of ICD-9-CM codes for identifying cardiovascular and stroke risk factors.

A review of approaches to identifying patient phenotype cohorts using electronic health records

Author and article information

Contributors

Journal

Affiliations

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 132

Cited by 18

Most referenced authors 272