4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Extracting chemical–protein relations with ensembles of SVM and deep learning models

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Mining relations between chemicals and proteins from the biomedical literature is an increasingly important task. The CHEMPROT track at BioCreative VI aims to promote the development and evaluation of systems that can automatically detect the chemical–protein relations in running text (PubMed abstracts). This work describes our CHEMPROT track entry, which is an ensemble of three systems, including a support vector machine, a convolutional neural network, and a recurrent neural network. Their output is combined using majority voting or stacking for final predictions. Our CHEMPROT system obtained 0.7266 in precision and 0.5735 in recall for an F-score of 0.6410 during the challenge, demonstrating the effectiveness of machine learning-based approaches for automatic relation extraction from biomedical literature and achieving the highest performance in the task during the 2017 challenge.

          Database URL: http://www.biocreative.org/tasks/biocreative-vi/track-5/

          Related collections

          Most cited references5

          • Record: found
          • Abstract: found
          • Article: not found

          Kernel approaches for genic interaction extraction.

          Automatic knowledge discovery and efficient information access such as named entity recognition and relation extraction between entities have recently become critical issues in the biomedical literature. However, the inherent difficulty of the relation extraction task, mainly caused by the diversity of natural language, is further compounded in the biomedical domain because biomedical sentences are commonly long and complex. In addition, relation extraction often involves modeling long range dependencies, discontiguous word patterns and semantic relations for which the pattern-based methodology is not directly applicable. In this article, we shift the focus of biomedical relation extraction from the problem of pattern extraction to the problem of kernel construction. We suggest four kernels: predicate, walk, dependency and hybrid kernels to adequately encapsulate information required for a relation prediction based on the sentential structures involved in two entities. For this purpose, we view the dependency structure of a sentence as a graph, which allows the system to deal with an essential one from the complex syntactic structure by finding the shortest path between entities. The kernels we suggest are augmented gradually from the flat features descriptions to the structural descriptions of the shortest paths. As a result, we obtain a very promising result, a 77.5 F-score with the walk kernel on the Language Learning in Logic (LLL) 05 genic interaction shared task. The used algorithms are free for use for academic research and are available from our Web site http://mllab.sogang.ac.kr/ approximately shkim/LLL05.tar.gz.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Chemical-induced disease relation extraction via convolutional neural network

            Abstract This article describes our work on the BioCreative-V chemical–disease relation (CDR) extraction task, which employed a maximum entropy (ME) model and a convolutional neural network model for relation extraction at inter- and intra-sentence level, respectively. In our work, relation extraction between entity concepts in documents was simplified to relation extraction between entity mentions. We first constructed pairs of chemical and disease mentions as relation instances for training and testing stages, then we trained and applied the ME model and the convolutional neural network model for inter- and intra-sentence level, respectively. Finally, we merged the classification results from mention level to document level to acquire the final relations between chemical and disease concepts. The evaluation on the BioCreative-V CDR corpus shows the effectiveness of our proposed approach. Database URL: http://www.biocreative.org/resources/corpora/biocreative-v-cdr-corpus/
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A Shortest Dependency Path Based Convolutional Neural Network for Protein-Protein Relation Extraction

              The state-of-the-art methods for protein-protein interaction (PPI) extraction are primarily based on kernel methods, and their performances strongly depend on the handcraft features. In this paper, we tackle PPI extraction by using convolutional neural networks (CNN) and propose a shortest dependency path based CNN (sdpCNN) model. The proposed method (1) only takes the sdp and word embedding as input and (2) could avoid bias from feature selection by using CNN. We performed experiments on standard Aimed and BioInfer datasets, and the experimental results demonstrated that our approach outperformed state-of-the-art kernel based methods. In particular, by tracking the sdpCNN model, we find that sdpCNN could extract key features automatically and it is verified that pretrained word embedding is crucial in PPI task.
                Bookmark

                Author and article information

                Journal
                Database (Oxford)
                Database (Oxford)
                databa
                Database: The Journal of Biological Databases and Curation
                Oxford University Press
                1758-0463
                2018
                17 July 2018
                17 July 2018
                : 2018
                : bay073
                Affiliations
                [1 ]National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
                [2 ]Department of Computer Science, University of Kentucky, Lexington, KY, USA
                [3 ]Division of Biomedical Informatics Department of Internal Medicine, University of Kentucky, Lexington, KY, USA
                Author notes
                Corresponding author: Tel.: +1 301 594 7089; Fax: +1 301 480 2288; E-mail: zhiyong.lu@ 123456nih.gov

                Citation details: Peng,Y., Rios,A., Kavuluru,R. et al. Extracting chemical–protein relations with ensembles of SVM and deep learning models. Database (2018) Vol. 2018: article ID bay073; doi:10.1093/database/bay073

                Author information
                http://orcid.org/0000-0001-9309-8331
                Article
                bay073
                10.1093/database/bay073
                6051439
                30020437
                740ee163-8d2f-4c3e-a5c4-720fd4b58f97
                Published by Oxford University Press 2018. This work is written by US Government employees and is in the public domain in the US.

                This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model ( https://academic.oup.com/journals/pages/about_us/legal/notices)

                History
                : 05 February 2018
                : 26 May 2018
                : 15 June 2018
                Page count
                Pages: 9
                Funding
                Funded by: National Library of Medicine
                Award ID: R21LM012274
                Categories
                Original Article

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article