10
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Wide-scope biomedical named entity recognition and normalization with CRFs, fuzzy matching and character level modeling

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We present a system for automatically identifying a multitude of biomedical entities from the literature. This work is based on our previous efforts in the BioCreative VI: Interactive Bio-ID Assignment shared task in which our system demonstrated state-of-the-art performance with the highest achieved results in named entity recognition. In this paper we describe the original conditional random field-based system used in the shared task as well as experiments conducted since, including better hyperparameter tuning and character level modeling, which led to further performance improvements. For normalizing the mentions into unique identifiers we use fuzzy character n-gram matching. The normalization approach has also been improved with a better abbreviation resolution method and stricter guideline compliance resulting in vastly improved results for various entity types. All tools and models used for both named entity recognition and normalization are publicly available under open license.

          Related collections

          Most cited references31

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Uberon, an integrative multi-species anatomy ontology

          We present Uberon, an integrated cross-species ontology consisting of over 6,500 classes representing a variety of anatomical entities, organized according to traditional anatomical classification criteria. The ontology represents structures in a species-neutral way and includes extensive associations to existing species-centric anatomical ontologies, allowing integration of model organism and human data. Uberon provides a necessary bridge between anatomical structures in different taxa for cross-species inference. It uses novel methods for representing taxonomic variation, and has proved to be essential for translational phenotype analyses. Uberon is available at http://uberon.org
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Gene: a gene-centered information resource at NCBI

            The National Center for Biotechnology Information's (NCBI) Gene database (www.ncbi.nlm.nih.gov/gene) integrates gene-specific information from multiple data sources. NCBI Reference Sequence (RefSeq) genomes for viruses, prokaryotes and eukaryotes are the primary foundation for Gene records in that they form the critical association between sequence and a tracked gene upon which additional functional and descriptive content is anchored. Additional content is integrated based on the genomic location and RefSeq transcript and protein sequence data. The content of a Gene record represents the integration of curation and automated processing from RefSeq, collaborating model organism databases, consortia such as Gene Ontology, and other databases within NCBI. Records in Gene are assigned unique, tracked integers as identifiers. The content (citations, nomenclature, genomic location, gene products and their attributes, phenotypes, sequences, interactions, variation details, maps, expression, homologs, protein domains and external databases) is available via interactive browsing through NCBI's Entrez system, via NCBI's Entrez programming utilities (E-Utilities and Entrez Direct) and for bulk transfer by FTP.
              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF

                Bookmark

                Author and article information

                Journal
                Database (Oxford)
                Database (Oxford)
                databa
                Database: The Journal of Biological Databases and Curation
                Oxford University Press
                1758-0463
                2018
                18 September 2018
                18 September 2018
                : 2018
                : bay096
                Affiliations
                [1 ]Turku Centre for Computer Science, Turku, Finland
                [2 ]Department of Future Technologies, University of Turku, Turku, Finland
                [3 ]University of Turku Graduate School, Turku, Finland
                Author notes
                Corresponding author: Tel. +358 2 333 7649; Email: sukaew@ 123456utu.fi

                These authors contributed equally to this work.

                Article
                bay096
                10.1093/database/bay096
                6146133
                30239666
                7ff0f55f-1eaf-49c9-b7f5-b627fd11a5c4
                © The Author(s) 2018. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 1 March 2018
                : 16 August 2018
                : 17 August 2018
                Page count
                Pages: 10
                Funding
                Funded by: ATT Tieto käyttöön
                Categories
                Original Article

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article