27
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: not found
      • Article: not found

      A primer on deep learning in genomics

      Read this article at

      ScienceOpenPublisherPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Deep learning methods are a class of machine learning techniques capable of identifying highly complex patterns in large datasets. Here, we provide a perspective and primer on deep learning applications for genome analysis. We discuss successful applications in the fields of regulatory genomics, variant calling and pathogenicity scores. We include general guidance for how to effectively use deep learning methods as well as a practical guide to tools and resources. This primer is accompanied by an interactive online tutorial.

          Related collections

          Most cited references21

          • Record: found
          • Abstract: found
          • Article: not found

          DANN: a deep learning approach for annotating the pathogenicity of genetic variants.

          Annotating genetic variants, especially non-coding variants, for the purpose of identifying pathogenic variants remains a challenge. Combined annotation-dependent depletion (CADD) is an algorithm designed to annotate both coding and non-coding variants, and has been shown to outperform other annotation algorithms. CADD trains a linear kernel support vector machine (SVM) to differentiate evolutionarily derived, likely benign, alleles from simulated, likely deleterious, variants. However, SVMs cannot capture non-linear relationships among the features, which can limit performance. To address this issue, we have developed DANN. DANN uses the same feature set and training data as CADD to train a deep neural network (DNN). DNNs can capture non-linear relationships among features and are better suited than SVMs for problems with a large number of samples and features. We exploit Compute Unified Device Architecture-compatible graphics processing units and deep learning techniques such as dropout and momentum training to accelerate the DNN training. DANN achieves about a 19% relative reduction in the error rate and about a 14% relative increase in the area under the curve (AUC) metric over CADD's SVM methodology. All data and source code are available at https://cbcl.ics.uci.edu/public_data/DANN/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk

            A key challenge for human genetics, precision medicine, and evolutionary biology is deciphering the regulatory code of gene expression, including understanding the transcriptional effects of genome variation. Yet this is extremely difficult due to the enormous scale of the noncoding mutation space. We developed a deep-learning-based framework, ExPecto, that can accurately predict, ab initio from DNA sequence, the tissue-specific transcriptional effects of mutations, including rare or never observed. We prioritized causal variants within disease/trait-associated loci from all publicly-available GWAS studies, and experimentally validated predictions for four immune-related diseases. Exploiting the scalability of ExPecto, we characterized the regulatory mutation space for all human Pol II-transcribed genes by in silico saturation mutagenesis, profiling >140 million promoter-proximal mutations. This enables probing of evolutionary constraints on gene expression and ab initio prediction of mutation disease effect, making ExPecto an end-to-end computational framework for in silico prediction of expression and disease risk.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              DeepNano: Deep recurrent neural networks for base calling in MinION nanopore reads

              The MinION device by Oxford Nanopore produces very long reads (reads over 100 kBp were reported); however it suffers from high sequencing error rate. We present an open-source DNA base caller based on deep recurrent neural networks and show that the accuracy of base calling is much dependent on the underlying software and can be improved by considering modern machine learning methods. By employing carefully crafted recurrent neural networks, our tool significantly improves base calling accuracy on data from R7.3 version of the platform compared to the default base caller supplied by the manufacturer. On R9 version, we achieve results comparable to Nanonet base caller provided by Oxford Nanopore. Availability of an open source tool with high base calling accuracy will be useful for development of new applications of the MinION device, including infectious disease detection and custom target enrichment during sequencing.
                Bookmark

                Author and article information

                Journal
                Nature Genetics
                Nat Genet
                Springer Nature America, Inc
                1061-4036
                1546-1718
                November 26 2018
                Article
                10.1038/s41588-018-0295-5
                30478442
                da0eddcc-3813-4b94-aa9c-6aa3a7c4377a
                © 2018

                http://www.springer.com/tdm

                History

                Comments

                Comment on this article