7
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Supervised Learning and Knowledge-Based Approaches Applied to Biomedical Word Sense Disambiguation

      research-article
      ,
      Journal of Integrative Bioinformatics
      De Gruyter
      Biomedical text mining, information extraction, word embeddings

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Word sense disambiguation (WSD) is an important step in biomedical text mining, which is responsible for assigning an unequivocal concept to an ambiguous term, improving the accuracy of biomedical information extraction systems. In this work we followed supervised and knowledge-based disambiguation approaches, with the best results obtained by supervised means. In the supervised method we used bag-of-words as local features, and word embeddings as global features. In the knowledge-based method we combined word embeddings, concept textual definitions extracted from the UMLS database, and concept association values calculated from the MeSH co-occurrence counts from MEDLINE articles. Also, in the knowledge-based method, we tested different word embedding averaging functions to calculate the surrounding context vectors, with the goal to give more importance to closest words of the ambiguous term. The MSH WSD dataset, the most common dataset used for evaluating biomedical concept disambiguation, was used to evaluate our methods. We obtained a top accuracy of 95.6 % by supervised means, while the best knowledge-based accuracy was 87.4 %. Our results show that word embedding models improved the disambiguation accuracy, proving to be a powerful resource in the WSD task.

          Related collections

          Most cited references14

          • Record: found
          • Abstract: not found
          • Article: not found

          Medical Subject Headings (MeSH).

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Deep learning with word embeddings improves biomedical named entity recognition

            Abstract Motivation: Text mining has become an important tool for biomedical research. The most fundamental text-mining task is the recognition of biomedical named entities (NER), such as genes, chemicals and diseases. Current NER methods rely on pre-defined features which try to capture the specific surface properties of entity types, properties of the typical local context, background knowledge, and linguistic information. State-of-the-art tools are entity-specific, as dictionaries and empirically optimal feature sets differ between entity types, which makes their development costly. Furthermore, features are often optimized for a specific gold standard corpus, which makes extrapolation of quality measures difficult. Results: We show that a completely generic method based on deep learning and statistical word embeddings [called long short-term memory network-conditional random field (LSTM-CRF)] outperforms state-of-the-art entity-specific NER tools, and often by a large margin. To this end, we compared the performance of LSTM-CRF on 33 data sets covering five different entity classes with that of best-of-class NER tools and an entity-agnostic CRF implementation. On average, F1-score of LSTM-CRF is 5% above that of the baselines, mostly due to a sharp increase in recall. Availability and implementation: The source code for LSTM-CRF is available at https://github.com/glample/tagger and the links to the corpora are available at https://corposaurus.github.io/corpora/. Contact: habibima@informatik.hu-berlin.de
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Efficient Estimation of Word Representations in Vector Space

              We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.
                Bookmark

                Author and article information

                Contributors
                Journal
                J Integr Bioinform
                J Integr Bioinform
                jib
                jib
                jib
                Journal of Integrative Bioinformatics
                De Gruyter
                1613-4516
                13 December 2017
                December 2017
                : 14
                : 4
                : 20170051
                Affiliations
                DETI/IEETA , University of Aveiro , 3810-193 Aveiro, Portugal
                Article
                jib-2017-0051
                10.1515/jib-2017-0051
                6042812
                29236676
                50a95e06-d2f9-44a7-86b5-ea526e6d375d
                ©2017, Rui Antunes and Sérgio Matos, published by DeGruyter, Berlin/Boston

                This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

                History
                : 17 August 2017
                : 9 September 2017
                : 11 November 2017
                Page count
                Tables: 8, References: 24, Pages: 8
                Categories
                Original Articles

                biomedical text mining,information extraction,word embeddings

                Comments

                Comment on this article