24
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Artificial intelligence enables comprehensive genome interpretation and nomination of candidate diagnoses for rare genetic diseases

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Clinical interpretation of genetic variants in the context of the patient’s phenotype is becoming the largest component of cost and time expenditure for genome-based diagnosis of rare genetic diseases. Artificial intelligence (AI) holds promise to greatly simplify and speed genome interpretation by integrating predictive methods with the growing knowledge of genetic disease. Here we assess the diagnostic performance of Fabric GEM, a new, AI-based, clinical decision support tool for expediting genome interpretation.

          Methods

          We benchmarked GEM in a retrospective cohort of 119 probands, mostly NICU infants, diagnosed with rare genetic diseases, who received whole-genome or whole-exome sequencing (WGS, WES). We replicated our analyses in a separate cohort of 60 cases collected from five academic medical centers. For comparison, we also analyzed these cases with current state-of-the-art variant prioritization tools. Included in the comparisons were trio, duo, and singleton cases. Variants underpinning diagnoses spanned diverse modes of inheritance and types, including structural variants (SVs). Patient phenotypes were extracted from clinical notes by two means: manually and using an automated clinical natural language processing (CNLP) tool. Finally, 14 previously unsolved cases were reanalyzed.

          Results

          GEM ranked over 90% of the causal genes among the top or second candidate and prioritized for review a median of 3 candidate genes per case, using either manually curated or CNLP-derived phenotype descriptions. Ranking of trios and duos was unchanged when analyzed as singletons. In 17 of 20 cases with diagnostic SVs, GEM identified the causal SVs as the top candidate and in 19/20 within the top five, irrespective of whether SV calls were provided or inferred ab initio by GEM using its own internal SV detection algorithm. GEM showed similar performance in absence of parental genotypes. Analysis of 14 previously unsolved cases resulted in a novel finding for one case, candidates ultimately not advanced upon manual review for 3 cases, and no new findings for 10 cases.

          Conclusions

          GEM enabled diagnostic interpretation inclusive of all variant types through automated nomination of a very short list of candidate genes and disorders for final review and reporting. In combination with deep phenotyping by CNLP, GEM enables substantial automation of genetic disease diagnosis, potentially decreasing cost and expediting case review.

          Supplementary Information

          The online version contains supplementary material available at 10.1186/s13073-021-00965-0.

          Related collections

          Most cited references101

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          A global reference for human genetic variation

          The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The mutational constraint spectrum quantified from variation in 141,456 humans

            Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes 1 . Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data

              High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a ‘variants reduction’ protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires ∼4 min to perform gene-based annotation and ∼15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/ .
                Bookmark

                Author and article information

                Contributors
                francisco.delavega@stanford.edu
                mreese@fabricgenomics.com
                myandell@genetics.utah.edu
                Journal
                Genome Med
                Genome Med
                Genome Medicine
                BioMed Central (London )
                1756-994X
                14 October 2021
                14 October 2021
                2021
                : 13
                : 153
                Affiliations
                [1 ]Fabric Genomics Inc., Oakland, CA USA
                [2 ]GRID grid.168010.e, ISNI 0000000419368956, Department of Biomedical Data Science, , Stanford University School of Medicine, ; Stanford, CA USA
                [3 ]Current Address: Tempus Labs Inc., Redwood City, CA 94065 USA
                [4 ]GRID grid.286440.c, ISNI 0000 0004 0383 2910, Rady Children’s Institute for Genomic Medicine, ; San Diego, CA USA
                [5 ]GRID grid.223827.e, ISNI 0000 0001 2193 0096, Department of Human Genetics, Utah Center for Genetic Discovery, , University of Utah, ; Salt Lake City, UT USA
                [6 ]GRID grid.38142.3c, ISNI 000000041936754X, Division of Genetics and Genomics, The Manton Center for Orphan Disease Research, , Boston Children’s Hospital, Harvard Medical School, ; Boston, MA USA
                [7 ]GRID grid.2515.3, ISNI 0000 0004 0378 8438, Division of Newborn Medicine, , Boston Children’s Hospital, ; Boston, MA USA
                [8 ]GRID grid.9764.c, ISNI 0000 0001 2153 9986, Institute of Clinical Molecular Biology, , Christian-Albrechts-University of Kiel & University Hospital Schleswig-Holstein, ; Kiel, Germany
                [9 ]GRID grid.417691.c, ISNI 0000 0004 0408 3720, HudsonAlpha Institute for Biotechnology, ; Huntsville, AL USA
                [10 ]GRID grid.412269.a, ISNI 0000 0001 0585 7044, Department of Clinical Genetics, United Laboratories, , Tartu University Hospital, ; Tartu, Estonia
                [11 ]GRID grid.10939.32, ISNI 0000 0001 0943 7661, Department of Clinical Genetics, Institute of Clinical Medicine, , University of Tartu, ; Tartu, Estonia
                [12 ]GRID grid.250942.8, ISNI 0000 0004 0507 3225, Center for Rare Childhood Disorders, , Translational Genomics Research Institute, ; Phoenix, AZ USA
                Author information
                http://orcid.org/0000-0002-9228-2097
                Article
                965
                10.1186/s13073-021-00965-0
                8515723
                34645491
                6bd6077d-5c96-4dc2-b854-8c896af4a135
                © The Author(s) 2021

                Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 22 March 2021
                : 27 August 2021
                Funding
                Funded by: TGen Foundation
                Funded by: Estonian Research Council
                Award ID: PUT355, PRG471, MOBTP175 and PUTJD827
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000051, National Human Genome Research Institute;
                Award ID: UM1 HG008900
                Award ID: HG009141
                Award ID: U54HD090255
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100005202, Muscular Dystrophy Association;
                Award ID: MDA602235
                Award Recipient :
                Categories
                Research
                Custom metadata
                © The Author(s) 2021

                Molecular medicine
                Molecular medicine

                Comments

                Comment on this article