D3NER: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Related collections

Most cited references 13

Record: found
Abstract: not found
Article: not found

An introduction to hidden Markov models

L. Rabiner, B. Juang (1986)

0 comments Cited 254 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

DNorm: disease name normalization with pairwise learning to rank

Robert Leaman, Rezarta Islamaj Doğan, Zhiyong Lu (2013)

Motivation: Despite the central role of diseases in biomedical research, there have been much fewer attempts to automatically determine which diseases are mentioned in a text—the task of disease name normalization (DNorm)—compared with other normalization tasks in biomedical text mining research. Methods: In this article we introduce the first machine learning approach for DNorm, using the NCBI disease corpus and the MEDIC vocabulary, which combines MeSH® and OMIM. Our method is a high-performing and mathematically principled framework for learning similarities between mentions and concept names directly from training data. The technique is based on pairwise learning to rank, which has not previously been applied to the normalization task but has proven successful in large optimization problems for information retrieval. Results: We compare our method with several techniques based on lexical normalization and matching, MetaMap and Lucene. Our algorithm achieves 0.782 micro-averaged F-measure and 0.809 macro-averaged F-measure, an increase over the highest performing baseline method of 0.121 and 0.098, respectively. Availability: The source code for DNorm is available at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/DNorm, along with a web-based demonstration and links to the NCBI disease corpus. Results on PubMed abstracts are available in PubTator: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator Contact: zhiyong.lu@nih.gov

0 comments Cited 164 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

NCBI disease corpus: a resource for disease name recognition and concept normalization.

Zhiyong Lu, Rezarta Islamaj Doğan, Robert Leaman (2014)

Information encoded in natural language in biomedical literature publications is only useful if efficient and reliable ways of accessing and analyzing that information are available. Natural language processing and text mining tools are therefore essential for extracting valuable information, however, the development of powerful, highly effective tools to automatically detect central biomedical concepts such as diseases is conditional on the availability of annotated corpora. This paper presents the disease name and concept annotations of the NCBI disease corpus, a collection of 793 PubMed abstracts fully annotated at the mention and concept level to serve as a research resource for the biomedical natural language processing community. Each PubMed abstract was manually annotated by two annotators with disease mentions and their corresponding concepts in Medical Subject Headings (MeSH®) or Online Mendelian Inheritance in Man (OMIM®). Manual curation was performed using PubTator, which allowed the use of pre-annotations as a pre-step to manual annotations. Fourteen annotators were randomly paired and differing annotations were discussed for reaching a consensus in two annotation phases. In this setting, a high inter-annotator agreement was observed. Finally, all results were checked against annotations of the rest of the corpus to assure corpus-wide consistency. The public release of the NCBI disease corpus contains 6892 disease mentions, which are mapped to 790 unique disease concepts. Of these, 88% link to a MeSH identifier, while the rest contain an OMIM identifier. We were able to link 91% of the mentions to a single disease concept, while the rest are described as a combination of concepts. In order to help researchers use the corpus to design and test disease identification methods, we have prepared the corpus as training, testing and development sets. To demonstrate its utility, we conducted a benchmarking experiment where we compared three different knowledge-based disease normalization methods with a best performance in F-measure of 63.7%. These results show that the NCBI disease corpus has the potential to significantly improve the state-of-the-art in disease name recognition and normalization research, by providing a high-quality gold standard thus enabling the development of machine-learning based approaches for such tasks. The NCBI disease corpus, guidelines and other associated resources are available at: http://www.ncbi.nlm.nih.gov/CBBresearch/Dogan/DISEASE/. Published by Elsevier Inc.

0 comments Cited 163 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Title: Bioinformatics

Publisher: Oxford University Press (OUP)

ISSN (Print): 1367-4803

ISSN (Electronic): 1460-2059

Publication date Created: October 15 2018

Publication date Created: April 30 2018

Publication date Other: October 15 2018

Publication date (Print): October 15 2018

Publication date (Electronic): April 30 2018

Volume: 34

Issue: 20

Pages: 3539-3546

Affiliations

[1 ]Department of Computational Science and Engineering, Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam

[2 ]Knowledge Technology Laboratory (KTLab), Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam

Article

DOI: 10.1093/bioinformatics/bty356

PubMed ID: 29718118

SO-VID: cb85ac60-a6d8-4201-ac2e-b49a05275caf

License:

https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model

D3NER: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information

Read this article at

Related collections

Genome Engineering using CRISPR

Most cited references 13

An introduction to hidden Markov models

DNorm: disease name normalization with pairwise learning to rank

NCBI disease corpus: a resource for disease name recognition and concept normalization.

Author and article information

Journal

Affiliations

Article

History

Comments

Comment on this article

Similar content 2,969

Cited by 29

Most referenced authors 200