14
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      TermInformer: unsupervised term mining and analysis in biomedical literature

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Terminology is the most basic information that researchers and literature analysis systems need to understand. Mining terms and revealing the semantic relationships between terms can help biomedical researchers find solutions to some major health problems and motivate researchers to explore innovative biomedical research issues. However, how to mine terms from biomedical literature remains a challenge. At present, the research on text segmentation in natural language processing (NLP) technology has not been well applied in the biomedical field. Named entity recognition models usually require a large amount of training corpus, and the types of entities that the model can recognize are limited. Besides, dictionary-based methods mainly use pre-established vocabularies to match the text. However, this method can only match terms in a specific field, and the process of collecting terms is time-consuming and labour-intensive. Many scenarios faced in the field of biomedical research are unsupervised, i.e. unlabelled corpora, and the system may not have much prior knowledge. This paper proposes the TermInformer project, which aims to mine the meaning of terms in an open fashion by calculating terms and find solutions to some of the significant problems in our society. We propose an unsupervised method that can automatically mine terms in the text without relying on external resources. Our method can generally be applied to any document data. Combined with the word vector training algorithm, we can obtain reusable term embeddings, which can be used in any NLP downstream application. This paper compares term embeddings with existing word embeddings. The results show that our method can better reflect the semantic relationship between terms. Finally, we use the proposed method to find potential factors and treatments for lung cancer, breast cancer, and coronavirus.

          Related collections

          Most cited references29

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Deep learning with word embeddings improves biomedical named entity recognition

          Abstract Motivation: Text mining has become an important tool for biomedical research. The most fundamental text-mining task is the recognition of biomedical named entities (NER), such as genes, chemicals and diseases. Current NER methods rely on pre-defined features which try to capture the specific surface properties of entity types, properties of the typical local context, background knowledge, and linguistic information. State-of-the-art tools are entity-specific, as dictionaries and empirically optimal feature sets differ between entity types, which makes their development costly. Furthermore, features are often optimized for a specific gold standard corpus, which makes extrapolation of quality measures difficult. Results: We show that a completely generic method based on deep learning and statistical word embeddings [called long short-term memory network-conditional random field (LSTM-CRF)] outperforms state-of-the-art entity-specific NER tools, and often by a large margin. To this end, we compared the performance of LSTM-CRF on 33 data sets covering five different entity classes with that of best-of-class NER tools and an entity-agnostic CRF implementation. On average, F1-score of LSTM-CRF is 5% above that of the baselines, mostly due to a sharp increase in recall. Availability and implementation: The source code for LSTM-CRF is available at https://github.com/glample/tagger and the links to the corpora are available at https://corposaurus.github.io/corpora/. Contact: habibima@informatik.hu-berlin.de
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Named Entity Recognition with Bidirectional LSTM-CNNs

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              A Neural Probabilistic Language Model

                Bookmark

                Author and article information

                Contributors
                prayag.tiwari@unipd.it
                sagar.uprety@open.ac.uk
                shahram.dehdashti@qut.edu.au
                mshossain@ksu.edu.sa
                Journal
                Neural Comput Appl
                Neural Comput Appl
                Neural Computing & Applications
                Springer London (London )
                0941-0643
                1433-3058
                16 September 2020
                : 1-14
                Affiliations
                [1 ]GRID grid.5608.b, ISNI 0000 0004 1757 3470, Department of Information Engineering, , University of Padova, ; Padua, Italy
                [2 ]GRID grid.10837.3d, ISNI 0000000096069301, The Open University, ; London, UK
                [3 ]GRID grid.1024.7, ISNI 0000000089150953, School of Information Systems, Science and Engineering Faculty, , Queensland University of Technology, ; Brisbane, Australia
                [4 ]GRID grid.56302.32, ISNI 0000 0004 1773 5396, Department of Software Engineering, , College of Computer and Information Sciences, King Saud University, ; Riyadh, 11543 Saudi Arabia
                Author information
                http://orcid.org/0000-0002-2851-4260
                https://orcid.org/0000-0001-5906-9422
                Article
                5335
                10.1007/s00521-020-05335-2
                7494250
                fbc5dc9d-94c5-4356-b65c-d3d1a5d86336
                © Springer-Verlag London Ltd., part of Springer Nature 2020

                This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.

                History
                : 17 June 2020
                : 2 September 2020
                Categories
                S.I.: Data Fusion in the era of Data Science

                Neural & Evolutionary computing
                term mining,unsupervised learning,term embeddings,sequence labelling,glove,biomedical literature

                Comments

                Comment on this article