7
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Efficient chemical-disease identification and relationship extraction using Wikipedia to improve recall

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Awareness of the adverse effects of chemicals is important in biomedical research and healthcare. Text mining can allow timely and low-cost extraction of this knowledge from the biomedical literature. We extended our text mining solution, LeadMine, to identify diseases and chemical-induced disease relationships (CIDs). LeadMine is a dictionary/grammar-based entity recognizer and was used to recognize and normalize both chemicals and diseases to Medical Subject Headings (MeSH) IDs. The disease lexicon was obtained from three sources: MeSH, the Disease Ontology and Wikipedia. The Wikipedia dictionary was derived from pages with a disease/symptom box, or those where the page title appeared in the lexicon. Composite entities (e.g. heart and lung disease) were detected and mapped to their composite MeSH IDs. For CIDs, we developed a simple pattern-based system to find relationships within the same sentence. Our system was evaluated in the BioCreative V Chemical–Disease Relation task and achieved very good results for both disease concept ID recognition (F 1-score: 86.12%) and CIDs (F 1-score: 52.20%) on the test set. As our system was over an order of magnitude faster than other solutions evaluated on the task, we were able to apply the same system to the entirety of MEDLINE allowing us to extract a collection of over 250 000 distinct CIDs.

          Related collections

          Most cited references2

          • Record: found
          • Abstract: not found
          • Article: not found

          Medical Subject Headings (MeSH).

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Comparative Toxicogenomics Database: a knowledgebase and discovery tool for chemical–gene–disease networks

            The Comparative Toxicogenomics Database (CTD) is a curated database that promotes understanding about the effects of environmental chemicals on human health. Biocurators at CTD manually curate chemical–gene interactions, chemical–disease relationships and gene–disease relationships from the literature. This strategy allows data to be integrated to construct chemical–gene–disease networks. CTD is unique in numerous respects: curation focuses on environmental chemicals; interactions are manually curated; interactions are constructed using controlled vocabularies and hierarchies; additional gene attributes (such as Gene Ontology, taxonomy and KEGG pathways) are integrated; data can be viewed from the perspective of a chemical, gene or disease; results and batch queries can be downloaded and saved; and most importantly, CTD acts as both a knowledgebase (by reporting data) and a discovery tool (by generating novel inferences). Over 116 000 interactions between 3900 chemicals and 13 300 genes have been curated from 270 species, and 5900 gene–disease and 2500 chemical–disease direct relationships have been captured. By integrating these data, 350 000 gene–disease relationships and 77 000 chemical–disease relationships can be inferred. This wealth of chemical–gene–disease information yields testable hypotheses for understanding the effects of environmental chemicals on human health. CTD is freely available at http://ctd.mdibl.org.
              Bookmark

              Author and article information

              Journal
              Database (Oxford)
              Database (Oxford)
              databa
              databa
              Database: The Journal of Biological Databases and Curation
              Oxford University Press
              1758-0463
              2016
              08 April 2016
              08 April 2016
              : 2016
              : baw039
              Affiliations
              NextMove Software Ltd, Innovation Centre, Unit 23, Science Park, Milton Road, Cambridge, United Kingdom
              Author notes
              *Corresponding author: Email: daniel@ 123456nextmovesoftware.com

              Citation details: Lowe,D.M., O'Boyle,N.M., Sayle,R.A. Efficient chemical-disease identification and relationship extraction using Wikipedia to improve recall. Database (2016) Vol. 2016: article ID baw037; doi:10.1093/database/baw037

              Article
              baw039
              10.1093/database/baw039
              4825350
              27060160
              0baae39b-e286-4a98-89eb-9203e064e9d1
              © The Author(s) 2016. Published by Oxford University Press.

              This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

              History
              : 04 December 2015
              : 29 February 2016
              : 02 March 2016
              Page count
              Pages: 6
              Categories
              Original Article

              Bioinformatics & Computational biology
              Bioinformatics & Computational biology

              Comments

              Comment on this article