Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Electronic medical records are an integral part of medical texts. Entity recognition of electronic medical records has triggered many studies that propose many entity extraction methods. In this paper, an entity extraction model is proposed to extract entities from Chinese Electronic Medical Records (CEMR). In the input layer of the model, we use word embedding and dictionary features embedding as input vectors, where word embedding consists of a character representation and a word representation. Then, the input vectors are fed to the bidirectional long short-term memory to capture contextual features. Finally, a conditional random field is employed to capture dependencies between neighboring tags. We performed experiments on body classification task, and the F1 values reached 90.65%. We also performed experiments on anatomic region recognition task, and the F1 values reached 93.89%. On both tasks, our model had higher performance than state-of-the-art models, such as Bi-LSTM-CRF, Bi-LSTM-Attention, and Vote. Through experiments, our model has a good effect when dealing with small frequency entities and unknown entities; with a small training dataset, our method showed 2–4% improvement on F1 value compared to the basic Bi-LSTM-CRF models. Additionally, on anatomic region recognition task, besides using our proposed entity extraction model, 12 rules we designed and domain dictionary were adopted. Then, in this task, the weighted F1 value of the three specific entities extraction reached 84.36%.

Related collections

Most cited references 36

Record: found
Abstract: found
Article: found

Is Open Access

Deep learning with word embeddings improves biomedical named entity recognition

Maryam Habibi, Leon Weber, Mariana Neves … (2017)

Abstract Motivation: Text mining has become an important tool for biomedical research. The most fundamental text-mining task is the recognition of biomedical named entities (NER), such as genes, chemicals and diseases. Current NER methods rely on pre-defined features which try to capture the specific surface properties of entity types, properties of the typical local context, background knowledge, and linguistic information. State-of-the-art tools are entity-specific, as dictionaries and empirically optimal feature sets differ between entity types, which makes their development costly. Furthermore, features are often optimized for a specific gold standard corpus, which makes extrapolation of quality measures difficult. Results: We show that a completely generic method based on deep learning and statistical word embeddings [called long short-term memory network-conditional random field (LSTM-CRF)] outperforms state-of-the-art entity-specific NER tools, and often by a large margin. To this end, we compared the performance of LSTM-CRF on 33 data sets covering five different entity classes with that of best-of-class NER tools and an entity-agnostic CRF implementation. On average, F1-score of LSTM-CRF is 5% above that of the baselines, mostly due to a sharp increase in recall. Availability and implementation: The source code for LSTM-CRF is available at https://github.com/glample/tagger and the links to the corpora are available at https://corposaurus.github.io/corpora/. Contact: habibima@informatik.hu-berlin.de

0 comments Cited 129 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Book: not found

Attention is all you need

Vaswani, A Vaswani, N. Shazeer … (2024)

0 comments Cited 128 times – based on 0 reviews

Bookmark

Record: found
Abstract: not found
Article: not found

Named Entity Recognition with Bidirectional LSTM-CNNs

Jason P.C. Chiu, Eric Nichols (2016)

0 comments Cited 115 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Int J Environ Res Public Health

Journal ID (iso-abbrev): Int J Environ Res Public Health

Journal ID (publisher-id): ijerph

Title: International Journal of Environmental Research and Public Health

Publisher: MDPI

ISSN (Print): 1661-7827

ISSN (Electronic): 1660-4601

Publication date (Electronic): 14 April 2020

Publication date (Print): April 2020

Volume: 17

Issue: 8

Electronic Location Identifier: 2687

Affiliations

[1 ]School of Computer, University of South China, Hengyang 421001, China; dragonc.cxl@ 123456gmail.com (X.C.); yongbinliu03@ 123456gmail.com (Y.L.)

[2 ]Center for Complex Networks and Systems Research, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN 47408, USA; buyi@ 123456umail.iu.edu

Author notes

[* ]Correspondence: ouyangcp@ 123456gmail.com

Author information

Yi Bu https://orcid.org/0000-0003-2549-4580

Article

Publisher ID: ijerph-17-02687

DOI: 10.3390/ijerph17082687

PMC ID: 7215438

PubMed ID: 32295174

SO-VID: 489e3914-c76d-417d-91f2-840b82932961

License:

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules

Read this article at

Abstract

Related collections

Chinese Journal of School Health

Most cited references 36

Deep learning with word embeddings improves biomedical named entity recognition

Attention is all you need

Named Entity Recognition with Bidirectional LSTM-CNNs

Author and article information

Journal

Affiliations

Author notes

Author information

Article

History

Categories

Comments

Comment on this article

Similar content 57

Cited by 6

Most referenced authors 340