8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Semantic prioritization of novel causative genomic variants

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Discriminating the causative disease variant(s) for individuals with inherited or de novo mutations presents one of the main challenges faced by the clinical genetics community today. Computational approaches for variant prioritization include machine learning methods utilizing a large number of features, including molecular information, interaction networks, or phenotypes. Here, we demonstrate the PhenomeNET Variant Predictor (PVP) system that exploits semantic technologies and automated reasoning over genotype-phenotype relations to filter and prioritize variants in whole exome and whole genome sequencing datasets. We demonstrate the performance of PVP in identifying causative variants on a large number of synthetic whole exome and whole genome sequences, covering a wide range of diseases and syndromes. In a retrospective study, we further illustrate the application of PVP for the interpretation of whole exome sequencing data in patients suffering from congenital hypothyroidism. We find that PVP accurately identifies causative variants in whole exome and whole genome sequencing datasets and provides a powerful resource for the discovery of causal variants.

          Author summary

          We address the problem of how to distinguish which of the many thousands of DNA sequence variants carried by an individual with a rare disease is responsible for the disease phenotypes. This can help clinicians arrive at a diagnosis, but also can be instrumental in improving our understanding of the pathobiology of the disease. Many methods are currently available to help with the problem of determining causative variant, using information about evolutionary conservation and prediction of the functional consequences of the sequence variant. We have developed a novel algorithm (PVP) which augments existing strategies by using the similarity of the patients phenotype to known phenotype-genotype data in human and model organism databases to further rank potential candidate genes. In a retrospective study, we apply PVP to the interpretation of whole exome sequencing data in patients suffering from congenital hypothyroidism, and find that PVP accurately identifies causative variants in whole exome and whole genome sequencing datasets and provides a powerful resource for the discovery of causal variants.

          Related collections

          Most cited references53

          • Record: found
          • Abstract: found
          • Article: not found

          Gene Ontology: tool for the unification of biology

          Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            DANN: a deep learning approach for annotating the pathogenicity of genetic variants.

            Annotating genetic variants, especially non-coding variants, for the purpose of identifying pathogenic variants remains a challenge. Combined annotation-dependent depletion (CADD) is an algorithm designed to annotate both coding and non-coding variants, and has been shown to outperform other annotation algorithms. CADD trains a linear kernel support vector machine (SVM) to differentiate evolutionarily derived, likely benign, alleles from simulated, likely deleterious, variants. However, SVMs cannot capture non-linear relationships among the features, which can limit performance. To address this issue, we have developed DANN. DANN uses the same feature set and training data as CADD to train a deep neural network (DNN). DNNs can capture non-linear relationships among features and are better suited than SVMs for problems with a large number of samples and features. We exploit Compute Unified Device Architecture-compatible graphics processing units and deep learning techniques such as dropout and momentum training to accelerate the DNN training. DANN achieves about a 19% relative reduction in the error rate and about a 14% relative increase in the area under the curve (AUC) metric over CADD's SVM methodology. All data and source code are available at https://cbcl.ics.uci.edu/public_data/DANN/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Uberon, an integrative multi-species anatomy ontology

              We present Uberon, an integrated cross-species ontology consisting of over 6,500 classes representing a variety of anatomical entities, organized according to traditional anatomical classification criteria. The ontology represents structures in a species-neutral way and includes extensive associations to existing species-centric anatomical ontologies, allowing integration of model organism and human data. Uberon provides a necessary bridge between anatomical structures in different taxa for cross-species inference. It uses novel methods for representing taxonomic variation, and has proved to be essential for translational phenotype analyses. Uberon is available at http://uberon.org
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Comput Biol
                PLoS Comput. Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, CA USA )
                1553-734X
                1553-7358
                April 2017
                17 April 2017
                : 13
                : 4
                : e1005500
                Affiliations
                [1 ]King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
                [2 ]Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
                [3 ]University of Cambridge Metabolic Research Laboratories, Wellcome Trust—Medical Research Council, Institute of Metabolic Science, Addenbrooke’s Hospital, Cambridge, United Kingdom
                [4 ]College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, United Kingdom
                [5 ]Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, United Kingdom
                [6 ]Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, United Kingdom
                [7 ]Department of Physiology, Development & Neuroscience, University of Cambridge, Cambridge, United Kingdom
                Johns Hopkins University, UNITED STATES
                Author notes

                The authors have declared that no competing interests exist.

                • Conceptualization: RH PNS GVG.

                • Data curation: YH IB MK RBMR NS EGS.

                • Formal analysis: IB RH GVG PNS.

                • Funding acquisition: GVG RH NS.

                • Investigation: IB RBMR MK YH.

                • Methodology: IB GVG PNS RH VBB.

                • Project administration: RH PNS GVG.

                • Resources: NS EGS.

                • Software: IB MK RBMR RH.

                • Supervision: RH VBB PNS GVG NS.

                • Validation: IB NS EGS PNS GVG RH.

                • Visualization: IB.

                • Writing – original draft: IB GVG PNS RH.

                • Writing – review & editing: IB RBMR NS VBB PNS GVG RH.

                Author information
                http://orcid.org/0000-0002-8996-3975
                http://orcid.org/0000-0003-1710-1820
                http://orcid.org/0000-0002-9855-1139
                http://orcid.org/0000-0001-5435-4750
                http://orcid.org/0000-0003-0360-2130
                http://orcid.org/0000-0001-8149-5890
                Article
                PCOMPBIOL-D-16-01833
                10.1371/journal.pcbi.1005500
                5411092
                28414800
                cab37e7a-31fa-40cf-8cc5-d56f65579837
                © 2017 Boudellioua et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 8 November 2016
                : 4 April 2017
                Page count
                Figures: 2, Tables: 3, Pages: 21
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/100004440, Wellcome Trust;
                Award ID: 100585/Z/12/Z
                Award Recipient :
                Funded by: National Institute for Health Research Cambridge Biomedical Research Centre
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000001, National Science Foundation;
                Award ID: IOS-1340112
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100010661, Horizon 2020 Framework Programme;
                Award ID: 731075
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/501100004052, King Abdullah University of Science and Technology;
                Award Recipient :
                NS was funded by Wellcome Trust (Grant 100585/Z/12/Z) and the National Institute for Health Research Cambridge Biomedical Research Centre. IB, RBMR, MK, YH, VBB, RH were funded by the King Abdullah University of Science and Technology. GVG acknowledges funding from the National Science Foundation (NSF grant number: IOS-1340112) and the European Commision H2020 (Grant Agreement No. 731075). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Research and Analysis Methods
                Experimental Organism Systems
                Model Organisms
                Research and Analysis Methods
                Model Organisms
                Biology and Life Sciences
                Genetics
                Phenotypes
                Research and Analysis Methods
                Database and Informatics Methods
                Biological Databases
                Genomic Databases
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Genomic Databases
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Genomic Databases
                Medicine and Health Sciences
                Pathology and Laboratory Medicine
                Pathogenesis
                Research and Analysis Methods
                Experimental Organism Systems
                Model Organisms
                Mouse Models
                Research and Analysis Methods
                Model Organisms
                Mouse Models
                Research and Analysis Methods
                Experimental Organism Systems
                Animal Models
                Mouse Models
                Biology and Life Sciences
                Genetics
                Genetic Loci
                Alleles
                Medicine and Health Sciences
                Pharmacology
                Drug Research and Development
                Drug Design
                Computer-Aided Drug Design
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Custom metadata
                vor-update-to-uncorrected-proof
                2017-05-01
                Source code developed for this project is available at https://github.com/bio-ontology-research-group/phenomenet-vp, and analysis results at http://www.cbrc.kaust.edu.sa/onto/pvp/. Data to UK10K samples are available from the European Genome-Phenome Archive through the UK10K Data Access Committee ( datasharing@ 123456sanger.ac.uk , https://www.uk10k.org/data_access.html) for researchers who meet the criteria for access to confidential data.

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article