MedLinker: Medical Entity Linking with Neural Representations and Dictionary Matching

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Progress in the field of Natural Language Processing (NLP) has been closely followed by applications in the medical domain. Recent advancements in Neural Language Models (NLMs) have transformed the field and are currently motivating numerous works exploring their application in different domains. In this paper, we explore how NLMs can be used for Medical Entity Linking with the recently introduced MedMentions dataset, which presents two major challenges: (1) a large target ontology of over 2M concepts, and (2) low overlap between concepts in train, validation and test sets. We introduce a solution, MedLinker, that addresses these issues by leveraging specialized NLMs with Approximate Dictionary Matching, and show that it performs competitively on semantic type linking, while improving the state-of-the-art on the more fine-grained task of concept linking (+4 F1 on MedMentions main task).

Related collections

Most cited references 2

Record: found
Abstract: found
Article: found

Is Open Access

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

Jinhyuk Lee, Wonjin Yoon, Sungdong Kim … (2019)

Abstract Motivation Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, directly applying the advancements in NLP to biomedical text mining often yields unsatisfactory results due to a word distribution shift from general domain corpora to biomedical corpora. In this article, we investigate how the recently introduced pre-trained language model BERT can be adapted for biomedical corpora. Results We introduce BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora. With almost the same architecture across tasks, BioBERT largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre-trained on biomedical corpora. While BERT obtains performance comparable to that of previous state-of-the-art models, BioBERT significantly outperforms them on the following three representative biomedical text mining tasks: biomedical named entity recognition (0.62% F1 score improvement), biomedical relation extraction (2.80% F1 score improvement) and biomedical question answering (12.24% MRR improvement). Our analysis results show that pre-training BERT on biomedical corpora helps it to understand complex biomedical texts. Availability and implementation We make the pre-trained weights of BioBERT freely available at https://github.com/naver/biobert-pretrained, and the source code for fine-tuning BioBERT available at https://github.com/dmis-lab/biobert.

0 comments Cited 655 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

TaggerOne: joint named entity recognition and normalization with semi-Markov Models.

Robert Leaman, Zhiyong Lu (2016)

Text mining is increasingly used to manage the accelerating pace of the biomedical literature. Many text mining applications depend on accurate named entity recognition (NER) and normalization (grounding). While high performing machine learning methods trainable for many entity types exist for NER, normalization methods are usually specialized to a single entity type. NER and normalization systems are also typically used in a serial pipeline, causing cascading errors and limiting the ability of the NER system to directly exploit the lexical information provided by the normalization.

0 comments Cited 96 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Joemon M. Jose: joemon.jose@glasgow.ac.uk

Emine Yilmaz: emine.yilmaz@ucl.ac.uk

João Magalhães: jm.magalhaes@fct.unl.pt

Pablo Castells: pablo.castells@uam.es

Nicola Ferro: ferro@dei.unipd.it

Mário J. Silva: mjs@inesc-id.pt

Flávio Martins: flaviomartins@acm.org

Daniel Loureiro: dloureiro@fc.up.pt

Alípio Mário Jorge: amjorge@fc.up.pt

Journal

Journal ID (publisher-id): 978-3-030-45442-5

Journal ID (doi): 10.1007/978-3-030-45442-5

Journal ID (nlm-ta): Advances in Information Retrieval

Title: Advances in Information Retrieval

Subtitle: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part II

ISBN (Print): 978-3-030-45441-8

ISBN (Electronic): 978-3-030-45442-5

Publication date (Electronic): 24 March 2020

Volume: 12036

Pages: 230-237

Affiliations

[8 ]GRID grid.8756.c, ISNI 0000 0001 2193 314X, University of Glasgow, ; Glasgow, UK

[9 ]GRID grid.83440.3b, ISNI 0000000121901201, University College London, ; London, UK

[10 ]GRID grid.10772.33, ISNI 0000000121511713, Universidade NOVA de Lisboa, ; Lisbon, Portugal

[11 ]GRID grid.5515.4, ISNI 0000000119578126, Universidad Autónoma de Madrid, ; Madrid, Spain

[12 ]GRID grid.5608.b, ISNI 0000 0004 1757 3470, University of Padua, ; Padua, Italy

[13 ]GRID grid.9983.b, ISNI 0000 0001 2181 4263, Universidade de Lisboa, ; Lisbon, Portugal

[14 ]GRID grid.10772.33, ISNI 0000000121511713, Universidade NOVA de Lisboa, ; Lisbon, Portugal

LIAAD - INESCTEC, Porto, Portugal

Article

Publisher ID: 29

DOI: 10.1007/978-3-030-45442-5_29

PMC ID: 7148021

SO-VID: 2d0802d8-6679-46c0-9007-8c34ef66a7cf

License:

This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.

History

Custom metadata

Keywords: entity linking,bioinformatics,neural language models

Data availability:

Keywords: entity linking, bioinformatics, neural language models

MedLinker: Medical Entity Linking with Neural Representations and Dictionary Matching

Read this article at

Abstract

Related collections

Medical Physics Publishing

Most cited references 2

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

TaggerOne: joint named entity recognition and normalization with semi-Markov Models.

Author and article information

Contributors

Journal

Affiliations

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 37

Cited by 1

Most referenced authors 37