1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Validating and automating learning of cardiometabolic polygenic risk scores from direct-to-consumer genetic and phenotypic data: implications for scaling precision health research

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Introduction

          A major challenge to enabling precision health at a global scale is the bias between those who enroll in state sponsored genomic research and those suffering from chronic disease. More than 30 million people have been genotyped by direct-to-consumer (DTC) companies such as 23andMe, Ancestry DNA, and MyHeritage, providing a potential mechanism for democratizing access to medical interventions and thus catalyzing improvements in patient outcomes as the cost of data acquisition drops. However, much of these data are sequestered in the initial provider network, without the ability for the scientific community to either access or validate. Here, we present a novel geno-pheno platform that integrates heterogeneous data sources and applies learnings to common chronic disease conditions including Type 2 diabetes (T2D) and hypertension.

          Methods

          We collected genotyped data from a novel DTC platform where participants upload their genotype data files and were invited to answer general health questionnaires regarding cardiometabolic traits over a period of 6 months. Quality control, imputation, and genome-wide association studies were performed on this dataset, and polygenic risk scores were built in a case–control setting using the BASIL algorithm.

          Results

          We collected data on N = 4,550 (389 cases / 4,161 controls) who reported being affected or previously affected for T2D and N = 4,528 (1,027 cases / 3,501 controls) for hypertension. We identified 164 out of 272 variants showing identical effect direction to previously reported genome-significant findings in Europeans. Performance metric of the PRS models was AUC = 0.68, which is comparable to previously published PRS models obtained with larger datasets including clinical biomarkers.

          Discussion

          DTC platforms have the potential of inverting research models of genome sequencing and phenotypic data acquisition. Quality control (QC) mechanisms proved to successfully enable traditional GWAS and PRS analyses. The direct participation of individuals has shown the potential to generate rich datasets enabling the creation of PRS cardiometabolic models. More importantly, federated learning of PRS from reuse of DTC data provides a mechanism for scaling precision health care delivery beyond the small number of countries who can afford to finance these efforts directly.

          Conclusions

          The genetics of T2D and hypertension have been studied extensively in controlled datasets, and various polygenic risk scores (PRS) have been developed. We developed predictive tools for both phenotypes trained with heterogeneous genotypic and phenotypic data generated outside of the clinical environment and show that our methods can recapitulate prior findings with fidelity. From these observations, we conclude that it is possible to leverage DTC genetic repositories to identify individuals at risk of debilitating diseases based on their unique genetic landscape so that informed, timely clinical interventions can be incorporated.

          Supplementary Information

          The online version contains supplementary material available at 10.1186/s40246-022-00406-y.

          Related collections

          Most cited references52

          • Record: found
          • Abstract: found
          • Article: not found

          PLINK: a tool set for whole-genome association and population-based linkage analyses.

          Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            A global reference for human genetic variation

            The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Regression Shrinkage and Selection Via the Lasso

                Bookmark

                Author and article information

                Contributors
                tomo@genomelink.io
                carlos.bustamante@galatea.bio
                Journal
                Hum Genomics
                Hum Genomics
                Human Genomics
                BioMed Central (London )
                1473-9542
                1479-7364
                8 September 2022
                8 September 2022
                2022
                : 16
                : 37
                Affiliations
                [1 ]Galatea Bio, Inc., 975 W 22nd Street, Hialeah, Florida 33010 USA
                [2 ]Amphora Health, Batallon Independencia 80, Morelia, Michoacan 58260 Mexico
                [3 ]Genomelink, Inc., 2150 Shattuck Avenue, Berkeley, California 94704 USA
                [4 ]Awakens Japan K.K., 2-11-3 Meguro, Meguro-ku Tokyo, 1530063 Japan
                [5 ]GRID grid.168010.e, ISNI 0000000419368956, Department of Biomedical Data Science, , Stanford University School of Medicine, ; 1265 Welch Road, Stanford, California 94305 USA
                [6 ]GRID grid.499295.a, ISNI 0000 0004 9234 0175, Chan Zuckerberg Biohub, ; 499 Illinois Street, San Francisco, California 94158 USA
                Article
                406
                10.1186/s40246-022-00406-y
                9452874
                36076307
                335a6252-762b-4f01-9f7f-e66c207e1981
                © The Author(s) 2022

                Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 8 March 2022
                : 6 August 2022
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/501100003051, New Energy and Industrial Technology Development Organization;
                Award ID: JPNP19001
                Award ID: JPNP19001
                Award ID: JPNP19001
                Award ID: JPNP19001
                Award Recipient :
                Categories
                Research
                Custom metadata
                © The Author(s) 2022

                Genetics
                polygenic risk score,genome-wide association study,type 2 diabetes mellitus,hypertension,ancestry

                Comments

                Comment on this article