Adapting Bidirectional Encoder Representations from Transformers (BERT) to Assess Clinical Semantic Textual Similarity: Algorithm Development and Validation Study

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Natural Language Understanding enables automatic extraction of relevant information from clinical text data, which are acquired every day in hospitals. In 2018, the language model Bidirectional Encoder Representations from Transformers (BERT) was introduced, generating new state-of-the-art results on several downstream tasks. The National NLP Clinical Challenges (n2c2) is an initiative that strives to tackle such downstream tasks on domain-specific clinical data. In this paper, we present the results of our participation in the 2019 n2c2 and related work completed thereafter.

Objective

The objective of this study was to optimally leverage BERT for the task of assessing the semantic textual similarity of clinical text data.

Methods

We used BERT as an initial baseline and analyzed the results, which we used as a starting point to develop 3 different approaches where we (1) added additional, handcrafted sentence similarity features to the classifier token of BERT and combined the results with more features in multiple regression estimators, (2) incorporated a built-in ensembling method, M-Heads, into BERT by duplicating the regression head and applying an adapted training strategy to facilitate the focus of the heads on different input patterns of the medical sentences, and (3) developed a graph-based similarity approach for medications, which allows extrapolating similarities across known entities from the training set. The approaches were evaluated with the Pearson correlation coefficient between the predicted scores and ground truth of the official training and test dataset.

Results

We improved the performance of BERT on the test dataset from a Pearson correlation coefficient of 0.859 to 0.883 using a combination of the M-Heads method and the graph-based similarity approach. We also show differences between the test and training dataset and how the two datasets influenced the results.

Conclusions

We found that using a graph-based similarity approach has the potential to extrapolate domain specific knowledge to unseen sentences. We observed that it is easily possible to obtain deceptive results from the test dataset, especially when the distribution of the data samples is different between training and test datasets.

Related collections

Most cited references 35

Record: found
Abstract: not found
Article: not found

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, Jia Deng, Jiang-hao Su … (2015)

0 comments Cited 2706 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Popular Ensemble Methods: An Empirical Study

D Opitz, R. Maclin (1999)

An ensemble consists of a set of individually trained classifiers (such as neural networks or decision trees) whose predictions are combined when classifying novel instances. Previous research has shown that an ensemble is often more accurate than any of the single classifiers in the ensemble. Bagging (Breiman, 1996c) and Boosting (Freund & Shapire, 1996; Shapire, 1990) are two relatively new but popular methods for producing ensembles. In this paper we evaluate these methods on 23 data sets using both neural networks and decision trees as our classification algorithm. Our results clearly indicate a number of conclusions. First, while Bagging is almost always more accurate than a single classifier, it is sometimes much less accurate than Boosting. On the other hand, Boosting can create ensembles that are less accurate than a single classifier -- especially when using neural networks. Analysis indicates that the performance of the Boosting methods is dependent on the characteristics of the data set being examined. In fact, further results show that Boosting ensembles may overfit noisy data sets, thus decreasing its performance. Finally, consistent with previous studies, our work suggests that most of the gain in an ensemble's performance comes in the first few classifiers combined; however, relatively large gains can be seen up to 25 classifiers when Boosting decision trees.

0 comments Cited 229 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Conference Proceedings: not found

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

Alexis Conneau, Antoine Bordes, Loïc Barrault … (2017)

0 comments Cited 188 times – based on 0 reviews

Bookmark

All references

Author and article information

Contributors

Klaus Kades:

ORCID: https://orcid.org/0000-0002-9387-9944

German Cancer Research Center (DKFZ)Im Neuenheimer Feld 280Heidelberg, 69120Germany49 6221420k.kades@dkfz.de

Journal

Journal ID (nlm-ta): JMIR Med Inform

Journal ID (iso-abbrev): JMIR Med Inform

Journal ID (publisher-id): JMI

Title: JMIR Medical Informatics

Publisher: JMIR Publications (Toronto, Canada )

ISSN (Electronic): 2291-9694

Publication date Collection: February 2021

Publication date (Electronic): 3 February 2021

Volume: 9

Issue: 2

Electronic Location Identifier: e22795

Affiliations

[1 ] German Cancer Research Center (DKFZ) Heidelberg Germany

[2 ] Partner Site Heidelberg German Cancer Consortium (DKTK) Heidelberg Germany

[3 ] Helmholtz Information and Data Science School for Health Karlsruhe/Heidelberg Germany

[4 ] Heidelberg University Heidelberg Germany

[5 ] Hochschule Mannheim University of Applied Sciences Mannheim Germany

[6 ] Institute for Artificial Intelligence in Medicine (IKIM) University Medicine Essen Essen Germany

Author notes

Corresponding Author: Klaus Kades k.kades@ 123456dkfz.de

Author information

Klaus Kades https://orcid.org/0000-0002-9387-9944

Jan Sellner https://orcid.org/0000-0003-4469-8343

Gregor Koehler https://orcid.org/0000-0002-5263-6786

Peter M Full https://orcid.org/0000-0003-4326-8026

T Y Emmy Lai https://orcid.org/0000-0002-5396-3543

Jens Kleesiek https://orcid.org/0000-0001-8686-0682

Klaus H Maier-Hein https://orcid.org/0000-0002-6626-2463

Article

Publisher ID: v9i2e22795

DOI: 10.2196/22795

PMC ID: 7889424

PubMed ID: 33533728

SO-VID: 0d7af5c4-6552-4b11-b32e-87dbc9b3f713

Copyright © ©Klaus Kades, Jan Sellner, Gregor Koehler, Peter M Full, T Y Emmy Lai, Jens Kleesiek, Klaus H Maier-Hein. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 03.02.2021.

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.

History

Date received : 28 July 2020

Date revision requested : 8 October 2020

Date revision received : 3 December 2020

Date accepted : 22 December 2020

Comments

Comment on this article

scite_

Cited by 6

See all cited by

Most referenced authors 680

See all reference authors

Submit your digital health research with an established publisher
- celebrating 25 years of open access