Supervised methods to extract clinical events from cardiology reports in Italian

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Clinical narratives are a valuable source of information for both patient care and biomedical research. Given the unstructured nature of medical reports, specific automatic techniques are required to extract relevant entities from such texts. In the natural language processing (NLP) community, this task is often addressed by using supervised methods. To develop such methods, both reliably-annotated corpora and elaborately designed features are needed. Despite the recent advances on corpora collection and annotation, research on multiple domains and languages is still limited. In addition, to compute the features required for supervised classification, suitable language- and domain-specific tools are needed. In this work, we propose a novel application of recurrent neural networks (RNNs) for event extraction from medical reports written in Italian. To train and evaluate the proposed approach, we annotated a corpus of 75 cardiology reports for a total of 4,365 mentions of relevant events and their attributes (e.g., the polarity). For the annotation task, we developed specific annotation guidelines, which are provided together with this paper. The RNN-based classifier was trained on a training set including 3,335 events (60 documents). The resulting model was integrated into an NLP pipeline that uses a dictionary lookup approach to search for relevant concepts inside the text. A test set of 1,030 events (15 documents) was used to evaluate and compare different pipeline configurations. As a main result, using the RNN-based classifier instead of the dictionary lookup approach allowed increasing recall from 52.4% to 88.9%, and precision from 81.1% to 88.2%. Further, using the two methods in combination, we obtained final recall, precision, and F1 score of 91.7%, 88.6%, and 90.1%, respectively. These experiments indicate that integrating a well-performing RNN-based classifier with a standard knowledge-based approach can be a good strategy to extract information from clinical text in non-English languages.

Related collections

Author and article information

Journal

Title: Journal of Biomedical Informatics

Abbreviated Title: Journal of Biomedical Informatics

Publisher: Elsevier BV

ISSN (Print): 15320464

Publication date Created: May 2019

Publication date (Print): May 2019

Page: 103219

Article

DOI: 10.1016/j.jbi.2019.103219

PMC ID: 6948016

PubMed ID: 31150777

SO-VID: 827ef25c-6cb5-4fff-a7bb-c4a37e9e6401

License:

https://www.elsevier.com/tdm/userlicense/1.0/

History

Data availability:

Comments

Comment on this article

scite_

Cited by 8

See all cited by

- Version 1