• Record: found
  • Abstract: found
  • Article: found
Is Open Access

Opportunities and obstacles for deep learning in biology and medicine

1 , 2 , 3 , 4 , 5 , 2 , 6 , 7 , 2 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 17 , 12 , 18 , 15 , 19 , 20 , 21 , 22 , 23 , 15 , 24 , 25 , 26 , 17 , 15 , 16 , 24 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 2

Journal of the Royal Society Interface

The Royal Society

deep learning, genomics, precision medicine, machine learning

Read this article at

      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


      Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems—patient classification, fundamental biological processes and treatment of patients—and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.

      Related collections

      Most cited references 399

      • Record: found
      • Abstract: found
      • Article: not found

      Deep learning.

      Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
        • Record: found
        • Abstract: found
        • Article: not found

        An Integrated Encyclopedia of DNA Elements in the Human Genome

        Summary The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure, and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall the project provides new insights into the organization and regulation of our genes and genome, and an expansive resource of functional annotations for biomedical research.
          • Record: found
          • Abstract: found
          • Article: not found

          A framework for variation discovery and genotyping using next-generation DNA sequencing data

          Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets.

            Author and article information

            [1 ]Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa , Honolulu, HI, USA
            [2 ]Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania , Philadelphia, PA, USA
            [3 ]Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania , Philadelphia, PA, USA
            [4 ]Department of Computational Medicine and Bioinformatics, University of Michigan Medical School , Ann Arbor, MI, USA
            [5 ]Harvard Medical School , Boston, MA, USA
            [6 ]Computational Biology and Stats, Target Sciences , GlaxoSmithKline, Stevenage, UK
            [7 ]Data Science Institute, Imperial College London , London, UK
            [8 ]Princess Margaret Cancer Centre , Toronto, Ontario, Canada
            [9 ]Department of Medical Biophysics , University of Toronto, Toronto, Ontario, Canada
            [10 ]Department of Computer Science , University of Toronto, Toronto, Ontario, Canada
            [11 ]Electrical Engineering and Computer Science, Vanderbilt University , Nashville, TN, USA
            [12 ]Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University , Philadelphia, PA, USA
            [13 ]Computational Biology Department, School of Computer Science, Carnegie Mellon University , Pittsburgh, PA, USA
            [14 ]Biophysics Program, Stanford University , Stanford, CA, USA
            [15 ]Department of Computer Science, Stanford University , Stanford, CA, USA
            [16 ]Department of Genetics, Stanford University , Stanford, CA, USA
            [17 ]Department of Computer Science, University of Virginia , Charlottesville, VA, USA
            [18 ]Imaging Platform, Broad Institute of Harvard and MIT , Cambridge, MA, USA
            [19 ]Toyota Technological Institute at Chicago , Chicago, IL, USA
            [20 ]Department of Computer Science, Trinity University , San Antonio, TX, USA
            [21 ]Lewis-Sigler Institute for Integrative Genomics, Princeton University , Princeton, NJ, USA
            [22 ]Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health , Research Triangle Park, NC, USA
            [23 ]Howard Hughes Medical Institute , Janelia Research Campus, Ashburn, VA, USA
            [24 ]National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health , Bethesda, MD, USA
            [25 ]Department of Wildlife Ecology and Conservation, University of Florida , Gainesville, FL, USA
            [26 ] , Austin, TX, USA
            [27 ]Division of Biomedical Informatics and Personalized Medicine, University of Colorado School of Medicine , Aurora, CO, USA
            [28 ]Institute of Organic Chemistry, Westfälische Wilhelms-Universität Münster , Münster, Germany
            [29 ]Innovation Center for Biomedical Informatics, Georgetown University Medical Center , Washington, DC, USA
            [30 ]Department of Pathology and Immunology, Washington University in Saint Louis , St Louis, MO, USA
            [31 ]Department of Medicine, Brown University , Providence, RI, USA
            [32 ]Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison , Madison, WI, USA
            [33 ]Morgridge Institute for Research , Madison, WI, USA
            Author notes

            Author order was determined with a randomized algorithm.

            J R Soc Interface
            J R Soc Interface
            Journal of the Royal Society Interface
            The Royal Society
            April 2018
            4 April 2018
            4 April 2018
            : 15
            : 141
            © 2018 The Authors.

            Published by the Royal Society under the terms of the Creative Commons Attribution License, which permits unrestricted use, provided the original author and source are credited.

            Funded by: Gordon and Betty Moore Foundation,;
            Award ID: GBMF 4552
            Award ID: GBMF 4563
            Funded by: National Institutes of Health,;
            Award ID: DP2GM123485
            Award ID: P30CA051008
            Award ID: R01AI116794
            Award ID: R01GM089652
            Award ID: R01GM089753
            Award ID: R01LM012222
            Award ID: R01LM012482
            Award ID: R21CA220398
            Award ID: T32GM007753
            Award ID: T32HG000046
            Award ID: U54AI117924
            Funded by: Roy and Diana Vagelos Scholars Program in the Molecular Life Sciences;
            Funded by: U.S. National Library of Medicine,;
            Award ID: Intramural Research Program
            Funded by: National Science Foundation,;
            Award ID: 1245632
            Award ID: 1531594
            Award ID: 1564955
            Funded by: Natural Sciences and Engineering Research Council of Canada,;
            Award ID: RGPIN-2015-3948
            Funded by: NSF;
            Award ID: 1245632
            Award ID: 1531594
            Award ID: 1564955
            Funded by: Howard Hughes Medical Institute,;
            Review Articles
            Headline Review
            Custom metadata
            April, 2018

            Life sciences

            precision medicine, genomics, deep learning, machine learning


            Comment on this article