22
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      CADD: predicting the deleteriousness of variants throughout the human genome

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Combined Annotation-Dependent Depletion (CADD) is a widely used measure of variant deleteriousness that can effectively prioritize causal variants in genetic analyses, particularly highly penetrant contributors to severe Mendelian disorders. CADD is an integrative annotation built from more than 60 genomic features, and can score human single nucleotide variants and short insertion and deletions anywhere in the reference assembly. CADD uses a machine learning model trained on a binary distinction between simulated de novo variants and variants that have arisen and become fixed in human populations since the split between humans and chimpanzees; the former are free of selective pressure and may thus include both neutral and deleterious alleles, while the latter are overwhelmingly neutral (or, at most, weakly deleterious) by virtue of having survived millions of years of purifying selection. Here we review the latest updates to CADD, including the most recent version, 1.4, which supports the human genome build GRCh38. We also present updates to our website that include simplified variant lookup, extended documentation, an Application Program Interface and improved mechanisms for integrating CADD scores into other tools or applications. CADD scores, software and documentation are available at https://cadd.gs.washington.edu.

          Related collections

          Most cited references 27

          • Record: found
          • Abstract: not found
          • Article: not found

          Python for Scientific Computing

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            dbNSFP v3.0: A One-Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice-Site SNVs.

             Xiaoming Liu (corresponding) ,  Chunlei Wu,  Chang Li (2016)
            The purpose of the dbNSFP is to provide a one-stop resource for functional predictions and annotations for human nonsynonymous single-nucleotide variants (nsSNVs) and splice-site variants (ssSNVs), and to facilitate the steps of filtering and prioritizing SNVs from a large list of SNVs discovered in an exome-sequencing study. A list of all potential nsSNVs and ssSNVs based on the human reference sequence were created and functional predictions and annotations were curated and compiled for each SNV. Here, we report a recent major update of the database to version 3.0. The SNV list has been rebuilt based on GENCODE 22 and currently the database includes 82,832,027 nsSNVs and ssSNVs. An attached database dbscSNV, which compiled all potential human SNVs within splicing consensus regions and their deleteriousness predictions, add another 15,030,459 potentially functional SNVs. Eleven prediction scores (MetaSVM, MetaLR, CADD, VEST3, PROVEAN, 4× fitCons, fathmm-MKL, and DANN) and allele frequencies from the UK10K cohorts and the Exome Aggregation Consortium (ExAC), among others, have been added. The original seven prediction scores in v2.0 (SIFT, 2× Polyphen2, LRT, MutationTaster, MutationAssessor, and FATHMM) as well as many SNV and gene functional annotations have been updated. dbNSFP v3.0 is freely available at http://sites.google.com/site/jpopgen/dbNSFP.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              DANN: a deep learning approach for annotating the pathogenicity of genetic variants.

              Annotating genetic variants, especially non-coding variants, for the purpose of identifying pathogenic variants remains a challenge. Combined annotation-dependent depletion (CADD) is an algorithm designed to annotate both coding and non-coding variants, and has been shown to outperform other annotation algorithms. CADD trains a linear kernel support vector machine (SVM) to differentiate evolutionarily derived, likely benign, alleles from simulated, likely deleterious, variants. However, SVMs cannot capture non-linear relationships among the features, which can limit performance. To address this issue, we have developed DANN. DANN uses the same feature set and training data as CADD to train a deep neural network (DNN). DNNs can capture non-linear relationships among features and are better suited than SVMs for problems with a large number of samples and features. We exploit Compute Unified Device Architecture-compatible graphics processing units and deep learning techniques such as dropout and momentum training to accelerate the DNN training. DANN achieves about a 19% relative reduction in the error rate and about a 14% relative increase in the area under the curve (AUC) metric over CADD's SVM methodology. All data and source code are available at https://cbcl.ics.uci.edu/public_data/DANN/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
                Bookmark

                Author and article information

                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                08 January 2019
                29 October 2018
                29 October 2018
                : 47
                : Database issue , Database issue
                : D886-D894
                Affiliations
                [1 ]Berlin Institute of Health (BIH), 10178 Berlin, Germany
                [2 ]Charité - Universitätsmedizin Berlin, 10117 Berlin, Germany
                [3 ]Department of Statistics and Biostatistics, University of Washington, Seattle, WA 98195, USA
                [4 ]HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA
                [5 ]Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
                [6 ]Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA
                Author notes
                To whom correspondence should be addressed. Tel: +49 30 450 543 004; Fax: +49 30 4507 543901; Email: martin.kircher@ 123456bihealth.de . Correspondence may also be addressed to Jay Shendure. Tel: +1 206 685 8543; Fax: +1 206 685 7301; Email: shendure@ 123456uw.edu
                Article
                gky1016
                10.1093/nar/gky1016
                6323892
                30371827
                © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                Page count
                Pages: 9
                Product
                Funding
                Funded by: National Cancer Institute 10.13039/100000054
                Award ID: 1R01CA197139
                Funded by: National Human Genome Research Institute 10.13039/100000051
                Award ID: 1U54HG006493
                Categories
                Database Issue

                Genetics

                Comments

                Comment on this article