170
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A CTD–Pfizer collaboration: manual curation of 88 000 scientific articles text mined for drug–disease and drug–phenotype interactions

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Improving the prediction of chemical toxicity is a goal common to both environmental health research and pharmaceutical drug development. To improve safety detection assays, it is critical to have a reference set of molecules with well-defined toxicity annotations for training and validation purposes. Here, we describe a collaboration between safety researchers at Pfizer and the research team at the Comparative Toxicogenomics Database (CTD) to text mine and manually review a collection of 88 629 articles relating over 1 200 pharmaceutical drugs to their potential involvement in cardiovascular, neurological, renal and hepatic toxicity. In 1 year, CTD biocurators curated 2 54 173 toxicogenomic interactions (1 52 173 chemical–disease, 58 572 chemical–gene, 5 345 gene–disease and 38 083 phenotype interactions). All chemical–gene–disease interactions are fully integrated with public CTD, and phenotype interactions can be downloaded. We describe Pfizer’s text-mining process to collate the articles, and CTD’s curation strategy, performance metrics, enhanced data content and new module to curate phenotype information. As well, we show how data integration can connect phenotypes to diseases. This curation can be leveraged for information about toxic endpoints important to drug safety and help develop testable hypotheses for drug–disease events. The availability of these detailed, contextualized, high-quality annotations curated from seven decades’ worth of the scientific literature should help facilitate new mechanistic screening assays for pharmaceutical compound survival. This unique partnership demonstrates the importance of resource sharing and collaboration between public and private entities and underscores the complementary needs of the environmental health science and pharmaceutical communities.

          Database URL: http://ctdbase.org/

          Related collections

          Most cited references31

          • Record: found
          • Abstract: found
          • Article: not found

          Gene Ontology: tool for the unification of biology

          Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Data Mining of the Public Version of the FDA Adverse Event Reporting System

            The US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS, formerly AERS) is a database that contains information on adverse event and medication error reports submitted to the FDA. Besides those from manufacturers, reports can be submitted from health care professionals and the public. The original system was started in 1969, but since the last major revision in 1997, reporting has markedly increased. Data mining algorithms have been developed for the quantitative detection of signals from such a large database, where a signal means a statistical association between a drug and an adverse event or a drug-associated adverse event, including the proportional reporting ratio (PRR), the reporting odds ratio (ROR), the information component (IC), and the empirical Bayes geometric mean (EBGM). A survey of our previous reports suggested that the ROR provided the highest number of signals, and the EBGM the lowest. Additionally, an analysis of warfarin-, aspirin- and clopidogrel-associated adverse events suggested that all EBGM-based signals were included in the PRR-based signals, and also in the IC- or ROR-based ones, and that the PRR- and IC-based signals were in the ROR-based ones. In this article, the latest information on this area is summarized for future pharmacoepidemiological studies and/or pharmacovigilance analyses.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The Comparative Toxicogenomics Database: update 2013

              The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) provides information about interactions between environmental chemicals and gene products and their relationships to diseases. Chemical–gene, chemical–disease and gene–disease interactions manually curated from the literature are integrated to generate expanded networks and predict many novel associations between different data types. CTD now contains over 15 million toxicogenomic relationships. To navigate this sea of data, we added several new features, including DiseaseComps (which finds comparable diseases that share toxicogenomic profiles), statistical scoring for inferred gene–disease and pathway–chemical relationships, filtering options for several tools to refine user analysis and our new Gene Set Enricher (which provides biological annotations that are enriched for gene sets). To improve data visualization, we added a Cytoscape Web view to our ChemComps feature, included color-coded interactions and created a ‘slim list’ for our MEDIC disease vocabulary (allowing diseases to be grouped for meta-analysis, visualization and better data management). CTD continues to promote interoperability with external databases by providing content and cross-links to their sites. Together, this wealth of expanded chemical–gene–disease data, combined with novel ways to analyze and view content, continues to help users generate testable hypotheses about the molecular mechanisms of environmental diseases.
                Bookmark

                Author and article information

                Journal
                Database (Oxford)
                Database (Oxford)
                database
                databa
                Database: The Journal of Biological Databases and Curation
                Oxford University Press
                1758-0463
                2013
                28 November 2013
                28 November 2013
                : 2013
                : bat080
                Affiliations
                1Department of Biological Sciences, 3510 Thomas Hall, North Carolina State University, Raleigh, NC 27695-7617, USA, 2Computational Sciences Center of Emphasis, 200 Cambridgepark Drive, Pfizer Inc., Cambridge, MA 02139, USA, 3Department of Bioinformatics, P.O. Box 35, Old Bar Harbor Road, MDI Biological Laboratory, Salisbury Cove, ME 04672, USA, 4Compound Safety Prediction, MS 8118-B3, Eastern Point Road, Pfizer Inc., Groton, CT 06340, USA, 5Computational Sciences Center of Emphasis, Pfizer Inc., Ramsgate Road, Sandwich, Kent CT13 9NJ, UK, 6Computational Sciences Center of Emphasis, 558 Eastern Point Road, Pfizer Inc., Groton, CT 06340, USA and 7Drug Safety Research and Development, 558 Eastern Point Road, Pfizer Inc., Groton, CT 06340, USA
                Author notes
                * Corresponding author: Tel: 207-288-3605; Fax: 207-288-2130; Email: apdavis3@ 123456ncsu.edu

                Present address: Kevin J. McConnell, Momenta Pharmaceuticals, 675 West Kendall Street, Cambridge, MA 02142, USA.

                Present address: Robert Hernandez, RDI, AstraZeneca, Alderley Park, Macclesfield, Cheshire, SK10 4TG, UK.

                Present address: Ahmed E. Enayetallah, Translational Medicine, 14 Cambridge Center, Biogen Idec, Cambridge, MA 02142, USA.

                Article
                bat080
                10.1093/database/bat080
                3842776
                24288140
                d653e595-4bc1-4294-bd73-1903142886f5
                © The Author(s) 2013. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 1 October 2013
                : 5 November 2013
                : 9 November 2013
                Page count
                Pages: 16
                Categories
                Original Article

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article