Inviting an author to review:
Find an author and click ‘Invite to review selected article’ near their name.
Search for authorsSearch for similar articles
2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Global Biobank analyses provide lessons for developing polygenic risk scores across diverse cohorts

      research-article
      1 , 2 , 30 , , 3 , 4 , 5 , 1 , 2 , 6 , 1 , 2 , 3 , 7 , 1 , 2 , 8 , 9 , 10 , 9 , 11 , 10 , 12 , 13 , 4 , 14 , 10 , 12 , 15 , 16 , 17 , 6 , 18 , 4 , 19 , 20 , 9 , 8 , 10 , Global Biobank Meta-analysis Initiative, 10 , 21 , 22 , 23 , 24 , 25 , 23 , 25 , 21 , 3 , 26 , 27 , 28 , 1 , 2 , 29 , ∗∗ , 23 , 25 , 29 , ∗∗∗
      Cell Genomics
      Elsevier
      Global-Biobank Meta-analysis Initiative, polygenic risk scores, multi-ancestry genetic prediction, accuracy heterogeneity

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Summary

          Polygenic risk scores (PRSs) have been widely explored in precision medicine. However, few studies have thoroughly investigated their best practices in global populations across different diseases. We here utilized data from Global Biobank Meta-analysis Initiative (GBMI) to explore methodological considerations and PRS performance in 9 different biobanks for 14 disease endpoints. Specifically, we constructed PRSs using pruning and thresholding (P + T) and PRS-continuous shrinkage (CS). For both methods, using a European-based linkage disequilibrium (LD) reference panel resulted in comparable or higher prediction accuracy compared with several other non-European-based panels. PRS-CS overall outperformed the classic P + T method, especially for endpoints with higher SNP-based heritability. Notably, prediction accuracy is heterogeneous across endpoints, biobanks, and ancestries, especially for asthma, which has known variation in disease prevalence across populations. Overall, we provide lessons for PRS construction, evaluation, and interpretation using GBMI resources and highlight the importance of best practices for PRS in the biobank-scale genomics era.

          Graphical abstract

          Highlights

          • PRS accuracy is heterogeneous across disease endpoints, ancestries, and biobanks

          • Larger sample sizes and greater diversity of GBMI improves PRS accuracy

          • Lessons and guidelines for developing PRS with multi-ancestry GWASs are provided

          Abstract

          Wang et al. used the unique resource from Global Biobank Meta-analysis Initiative to develop and evaluate PRSs for 14 disease endpoints with varying genetic architectures and prevalences. They developed guidelines regarding the effects of multi-ancestry and heterogeneous GWASs, trait-specific genetic architecture, and PRS methods on prediction performance across diverse populations.

          Related collections

          Most cited references58

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          A global reference for human genetic variation

          The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Second-generation PLINK: rising to the challenge of larger and richer datasets

            PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for even faster and more scalable implementations of key functions. In addition, GWAS and population-genetic data now frequently contain probabilistic calls, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1's primary data format. To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, O(sqrt(n))-time/constant-space Hardy-Weinberg equilibrium and Fisher's exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. This will be followed by PLINK 2.0, which will introduce (a) a new data format capable of efficiently representing probabilities, phase, and multiallelic variants, and (b) extensions of many functions to account for the new types of information. The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The UK Biobank resource with deep phenotyping and genomic data

              The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.
                Bookmark

                Author and article information

                Contributors
                Journal
                Cell Genom
                Cell Genom
                Cell Genomics
                Elsevier
                2666-979X
                04 January 2023
                11 January 2023
                04 January 2023
                : 3
                : 1
                : 100241
                Affiliations
                [1 ]Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
                [2 ]Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
                [3 ]Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita 565-0871, Japan
                [4 ]Department of Genetics, UMCG, University of Groningen, Groningen, the Netherlands
                [5 ]Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Helsinki, Finland
                [6 ]Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
                [7 ]Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
                [8 ]Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48103, USA
                [9 ]Ontario Institute for Cancer Research, Toronto, ON, Canada
                [10 ]K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, 7030 Trondheim, Norway
                [11 ]Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
                [12 ]HUNT Research Centre, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, 7600 Levanger, Norway
                [13 ]Clinic of Medicine, St. Olav’s Hospital, Trondheim University Hospital, 7030 Trondheim, Norway
                [14 ]Oncode Institute, Utrecht, the Netherlands
                [15 ]Department of Ophthalmology, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
                [16 ]Department of Clinical Genetics, Amsterdam University Medical Center (AMC), Amsterdam, the Netherlands
                [17 ]Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
                [18 ]Division of Molecular Pathology, Institute of Medical Science, the University of Tokyo, Tokyo, Japan
                [19 ]Institute for Genetics and Biomedical Research (IRGB), National Research Council (CNR), 09100 Cagliari, Italy
                [20 ]Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
                [21 ]Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109, USA
                [22 ]Department of Biostatistics and Center for Statistical Genetics, and Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
                [23 ]Department of Medicine, Division of Genetic Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA
                [24 ]MRC Epidemiology Unit, University of Cambridge, Cambridge, UK
                [25 ]Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
                [26 ]Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
                [27 ]Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC) and Center for Infectious Disease Education and Research (CiDER), Osaka University, Suita 565-0871, Japan
                [28 ]Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo 113-0033, Japan
                Author notes
                []Corresponding author yiwang@ 123456broadinstitute.org
                [∗∗ ]Corresponding author armartin@ 123456broadinstitute.org
                [∗∗∗ ]Corresponding author jibril.hirbo@ 123456vumc.org
                [29]

                These authors contributed equally

                [30]

                Lead contact

                Article
                S2666-979X(22)00204-X 100241
                10.1016/j.xgen.2022.100241
                9903818
                36777179
                516b17b1-81cf-4b26-975d-ecb09880a52d
                © 2022 The Author(s)

                This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

                History
                : 1 December 2021
                : 28 August 2022
                : 3 December 2022
                Categories
                Article

                global-biobank meta-analysis initiative,polygenic risk scores,multi-ancestry genetic prediction,accuracy heterogeneity

                Comments

                Comment on this article