Validity of Natural Language Processing for Ascertainment of 
      <i>EGFR</i> and 
      <i>ALK</i> Test Results in SEER Cases of Stage IV Non–Small-Cell Lung Cancer

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

PURPOSE

SEER registries do not report results of epidermal growth factor receptor ( EGFR) and anaplastic lymphoma kinase ( ALK) mutation tests. To facilitate population-based research in molecularly defined subgroups of non–small-cell lung cancer (NSCLC), we assessed the validity of natural language processing (NLP) for the ascertainment of EGFR and ALK testing from electronic pathology (e-path) reports of NSCLC cases included in two SEER registries: the Cancer Surveillance System (CSS) and the Kentucky Cancer Registry (KCR).

METHODS

We obtained 4,278 e-path reports from 1,634 patients who were diagnosed with stage IV nonsquamous NSCLC from September 1, 2011, to December 31, 2013, included in CSS. We used 855 CSS reports to train NLP systems for the ascertainment of EGFR and ALK test status (reported v not reported) and test results (positive v negative). We assessed sensitivity, specificity, and positive and negative predictive values in an internal validation sample of 3,423 CSS e-path reports and repeated the analysis in an external sample of 1,041 e-path reports from 565 KCR patients. Two oncologists manually reviewed all e-path reports to generate gold-standard data sets.

RESULTS

NLP systems yielded internal validity metrics that ranged from 0.95 to 1.00 for EGFR and ALK test status and results in CSS e-path reports. NLP showed high internal accuracy for the ascertainment of EGFR and ALK in CSS patients—F scores of 0.95 and 0.96, respectively. In the external validation analysis, NLP yielded metrics that ranged from 0.02 to 0.96 in KCR reports and F scores of 0.70 and 0.72, respectively, in KCR patients.

CONCLUSION

NLP is an internally valid method for the ascertainment of EGFR and ALK test information from e-path reports available in SEER registries, but future work is necessary to increase NLP external validity.

Related collections

Author and article information

Journal

Journal ID (nlm-ta): JCO Clin Cancer Inform

Journal ID (iso-abbrev): JCO Clin Cancer Inform

Journal ID (hwp): cci

Journal ID (pmc): cci

Journal ID (publisher-id): CCI

Title: JCO Clinical Cancer Informatics

Publisher: American Society of Clinical Oncology

ISSN (Electronic): 2473-4276

Publication date Collection: 2019

Publication date (Electronic): 6 May 2019

Volume: 3

Electronic Location Identifier: CCI.18.00098

Affiliations

[ ¹ ]Fred Hutchinson Cancer Research Center, Seattle, WA

[ ² ]University of Washington, Seattle, WA

[ ³ ]University of Kentucky, Lexington, KY

Author notes

Bernardo Haddock Lobo Goulart, MD, University of Seattle, 1100 Fairview Ave N, PO Box 19024, Seattle, WA 98109; e-mail: bgoulart@ 123456fredhutch.org .

Article

Accession ID: PMC6874053 Pmcid ID: PMC6874053 Pmc-uid ID: 6874053 Publisher ID: 1800098

DOI: 10.1200/CCI.18.00098

PMC ID: 6874053

PubMed ID: 31058542

SO-VID: b963f4c7-fa47-40e3-a64a-96c93721bdad

History

Date accepted : 29 January 2019

Page count

Figures: 2, Tables: 6, Equations: 0, References: 21, Pages: 15

Custom metadata

SJS Export v1

Data availability:

Comments

Comment on this article

scite_

Cited by 4

See all cited by

Validity of Natural Language Processing for Ascertainment of EGFR and ALK Test Results in SEER Cases of Stage IV Non–Small-Cell Lung Cancer

Read this article at