6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The Precision Medicine Initiative is a multicenter effort aiming at formulating personalized treatments leveraging on individual patient data (clinical, genome sequence and functional genomic data) together with the information in large knowledge bases (KBs) that integrate genome annotation, disease association studies, electronic health records and other data types. The biomedical literature provides a rich foundation for populating these KBs, reporting genetic and molecular interactions that provide the scaffold for the cellular regulatory systems and detailing the influence of genetic variants in these interactions. The goal of BioCreative VI Precision Medicine Track was to extract this particular type of information and was organized in two tasks: (i) document triage task, focused on identifying scientific literature containing experimentally verified protein–protein interactions (PPIs) affected by genetic mutations and (ii) relation extraction task, focused on extracting the affected interactions (protein pairs). To assist system developers and task participants, a large-scale corpus of PubMed documents was manually annotated for this task. Ten teams worldwide contributed 22 distinct text-mining models for the document triage task, and six teams worldwide contributed 14 different text-mining systems for the relation extraction task. When comparing the text-mining system predictions with human annotations, for the triage task, the best F-score was 69.06%, the best precision was 62.89%, the best recall was 98.0% and the best average precision was 72.5%. For the relation extraction task, when taking homologous genes into account, the best F-score was 37.73%, the best precision was 46.5% and the best recall was 54.1%. Submitted systems explored a wide range of methods, from traditional rule-based, statistical and machine learning systems to state-of-the-art deep learning methods. Given the level of participation and the individual team results we find the precision medicine track to be successful in engaging the text-mining research community. In the meantime, the track produced a manually annotated corpus of 5509 PubMed documents developed by BioGRID curators and relevant for precision medicine. The data set is freely available to the community, and the specific interactions have been integrated into the BioGRID data set. In addition, this challenge provided the first results of automatically identifying PubMed articles that describe PPI affected by mutations, as well as extracting the affected relations from those articles. Still, much progress is needed for computer-assisted precision medicine text mining to become mainstream. Future work should focus on addressing the remaining technical challenges and incorporating the practical benefits of text-mining tools into real-world precision medicine information-related curation.

          Related collections

          Most cited references42

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The BioGRID interaction database: 2017 update

          The Biological General Repository for Interaction Datasets (BioGRID: https://thebiogrid.org) is an open access database dedicated to the annotation and archival of protein, genetic and chemical interactions for all major model organism species and humans. As of September 2016 (build 3.4.140), the BioGRID contains 1 072 173 genetic and protein interactions, and 38 559 post-translational modifications, as manually annotated from 48 114 publications. This dataset represents interaction records for 66 model organisms and represents a 30% increase compared to the previous 2015 BioGRID update. BioGRID curates the biomedical literature for major model organism species, including humans, with a recent emphasis on central biological processes and specific human diseases. To facilitate network-based approaches to drug discovery, BioGRID now incorporates 27 501 chemical–protein interactions for human drug targets, as drawn from the DrugBank database. A new dynamic interaction network viewer allows the easy navigation and filtering of all genetic and protein interaction data, as well as for bioactive compounds and their established targets. BioGRID data are directly downloadable without restriction in a variety of standardized formats and are freely distributed through partner model organism databases and meta-databases.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The DDI corpus: an annotated corpus with pharmacological substances and drug-drug interactions.

            The management of drug-drug interactions (DDIs) is a critical issue resulting from the overwhelming amount of information available on them. Natural Language Processing (NLP) techniques can provide an interesting way to reduce the time spent by healthcare professionals on reviewing biomedical literature. However, NLP techniques rely mostly on the availability of the annotated corpora. While there are several annotated corpora with biological entities and their relationships, there is a lack of corpora annotated with pharmacological substances and DDIs. Moreover, other works in this field have focused in pharmacokinetic (PK) DDIs only, but not in pharmacodynamic (PD) DDIs. To address this problem, we have created a manually annotated corpus consisting of 792 texts selected from the DrugBank database and other 233 Medline abstracts. This fined-grained corpus has been annotated with a total of 18,502 pharmacological substances and 5028 DDIs, including both PK as well as PD interactions. The quality and consistency of the annotation process has been ensured through the creation of annotation guidelines and has been evaluated by the measurement of the inter-annotator agreement between two annotators. The agreement was almost perfect (Kappa up to 0.96 and generally over 0.80), except for the DDIs in the MedLine database (0.55-0.72). The DDI corpus has been used in the SemEval 2013 DDIExtraction challenge as a gold standard for the evaluation of information extraction techniques applied to the recognition of pharmacological substances and the detection of DDIs from biomedical texts. DDIExtraction 2013 has attracted wide attention with a total of 14 teams from 7 different countries. For the task of recognition and classification of pharmacological names, the best system achieved an F1 of 71.5%, while, for the detection and classification of DDIs, the best result was F1 of 65.1%. These results show that the corpus has enough quality to be used for training and testing NLP techniques applied to the field of Pharmacovigilance. The DDI corpus and the annotation guidelines are free for use for academic research and are available at http://labda.inf.uc3m.es/ddicorpus. Copyright © 2013 Elsevier Inc. All rights reserved.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Overview of BioCreAtIvE: critical assessment of information extraction for biology

              Background The goal of the first BioCreAtIvE challenge (Critical Assessment of Information Extraction in Biology) was to provide a set of common evaluation tasks to assess the state of the art for text mining applied to biological problems. The results were presented in a workshop held in Granada, Spain March 28–31, 2004. The articles collected in this BMC Bioinformatics supplement entitled "A critical assessment of text mining methods in molecular biology" describe the BioCreAtIvE tasks, systems, results and their independent evaluation. Results BioCreAtIvE focused on two tasks. The first dealt with extraction of gene or protein names from text, and their mapping into standardized gene identifiers for three model organism databases (fly, mouse, yeast). The second task addressed issues of functional annotation, requiring systems to identify specific text passages that supported Gene Ontology annotations for specific proteins, given full text articles. Conclusion The first BioCreAtIvE assessment achieved a high level of international participation (27 groups from 10 countries). The assessment provided state-of-the-art performance results for a basic task (gene name finding and normalization), where the best systems achieved a balanced 80% precision / recall or better, which potentially makes them suitable for real applications in biology. The results for the advanced task (functional annotation from free text) were significantly lower, demonstrating the current limitations of text-mining approaches where knowledge extrapolation and interpretation are required. In addition, an important contribution of BioCreAtIvE has been the creation and release of training and test data sets for both tasks. There are 22 articles in this special issue, including six that provide analyses of results or data quality for the data sets, including a novel inter-annotator consistency assessment for the test set used in task 2.
                Bookmark

                Author and article information

                Journal
                Database (Oxford)
                Database (Oxford)
                databa
                Database: The Journal of Biological Databases and Curation
                Oxford University Press
                1758-0463
                2019
                28 January 2019
                28 January 2019
                : 2019
                : bay147
                Affiliations
                [1 ]National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
                [2 ]Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, Canada
                [3 ]Department of Electronics, Telecommunications and Informatics (DETI)/Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
                [4 ]School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia
                [5 ]Department of Health Science Research, Mayo Clinic, Rochester, MN, USA
                [6 ]School of Computer Science and Technology, Dalian University of Technology, Dalian, China
                [7 ]Department of Computer Engineering, Marmara University, Istanbul, Turkey
                [8 ]Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey
                [9 ]School of Electrical and Computer Engineering, National Technical University of Athens, Zografou, Athens, Greece
                [10 ]Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei, Taiwan
                [11 ]Department of Electrical Engineering, National Kaousiung University of Science and Technology, Kaohsiung, Taiwan
                [12 ]Department of Computer Science, University of Kentucky, Lexington, KY, USA
                [13 ]Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, KY, USA
                [14 ]College of Computer Science and Technology, Dalian University of Technology, Dalian, China
                [15 ]Department of Statistics, Florida State University, Florida, USA
                Author notes
                Corresponding author: Tel.: 1 301 594 7089; Fax: 1 301 480 2288; Email: zhiyong.lu@ 123456nih.gov
                Author information
                http://orcid.org/0000-0003-3533-8872
                http://orcid.org/0000-0002-6036-1516
                http://orcid.org/0000-0002-8661-1544
                http://orcid.org/0000-0003-1052-7626
                http://orcid.org/0000-0001-8376-1056
                http://orcid.org/0000-0001-8166-0681
                http://orcid.org/0000-0002-5141-0259
                http://orcid.org/0000-0001-5871-6245
                Article
                bay147
                10.1093/database/bay147
                6348314
                30689846
                c20072b3-6fe9-41a7-8708-b8095d10cc92
                Published by Oxford University Press 2019.

                This work is written by US Government employees and is in the public domain in the US.

                History
                : 2 March 2018
                : 17 December 2018
                : 19 December 2018
                Page count
                Pages: 17
                Funding
                Funded by: National Institutes of Health Office of Research Infrastructure Programs
                Award ID: R24OD011194
                Award ID: R01OD010929
                Funded by: National Institutes of Health Intramural Research Program National Library of Medicine
                Categories
                Original Article

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article