42
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Analysis of polygenic risk score usage and performance in diverse human populations

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          A historical tendency to use European ancestry samples hinders medical genetics research, including the use of polygenic scores, which are individual-level metrics of genetic risk. We analyze the first decade of polygenic scoring studies (2008–2017, inclusive), and find that 67% of studies included exclusively European ancestry participants and another 19% included only East Asian ancestry participants. Only 3.8% of studies were among cohorts of African, Hispanic, or Indigenous peoples. We find that predictive performance of European ancestry-derived polygenic scores is lower in non-European ancestry samples (e.g. African ancestry samples: t = −5.97, df = 24, p = 3.7 × 10 −6), and we demonstrate the effects of methodological choices in polygenic score distributions for worldwide populations. These findings highlight the need for improved treatment of linkage disequilibrium and variant frequencies when applying polygenic scoring to cohorts of non-European ancestry, and bolster the rationale for large-scale GWAS in diverse human populations.

          Abstract

          Predominant participation of European-ancestry individuals in genetic studies has hindered the better understanding of genetic risk in non-European ancestry individuals. Here, Duncan et al. quantify polygenic risk score use and performance in worldwide populations.

          Related collections

          Most cited references49

          • Record: found
          • Abstract: found
          • Article: not found

          PLINK: a tool set for whole-genome association and population-based linkage analyses.

          Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            A global reference for human genetic variation

            The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Second-generation PLINK: rising to the challenge of larger and richer datasets

              PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for even faster and more scalable implementations of key functions. In addition, GWAS and population-genetic data now frequently contain probabilistic calls, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1's primary data format. To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, O(sqrt(n))-time/constant-space Hardy-Weinberg equilibrium and Fisher's exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. This will be followed by PLINK 2.0, which will introduce (a) a new data format capable of efficiently representing probabilities, phase, and multiallelic variants, and (b) extensions of many functions to account for the new types of information. The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.
                Bookmark

                Author and article information

                Contributors
                LaramieD@Stanford.edu
                Journal
                Nat Commun
                Nat Commun
                Nature Communications
                Nature Publishing Group UK (London )
                2041-1723
                25 July 2019
                25 July 2019
                2019
                : 10
                : 3328
                Affiliations
                [1 ]ISNI 0000000419368956, GRID grid.168010.e, Department of Psychiatry and Behavioral Sciences, , Stanford University, ; 401 Quarry Road, Stanford, CA 94305 USA
                [2 ]ISNI 000000041936754X, GRID grid.38142.3c, Department of Epidemiology, , Harvard T.H. Chan School of Public Health, ; 667 Huntington Ave, Kresge 505, Boston, MA 02115 USA
                [3 ]ISNI 0000 0000 8795 072X, GRID grid.240206.2, Mailman Research Center, Harvard Medical School, , McLean Hospital, ; 115 Mill St, Belmont, MA 02478 USA
                [4 ]ISNI 0000000419368956, GRID grid.168010.e, Department of Biology, , Stanford University, ; Herrin 478A, Stanford, CA 94305 USA
                [5 ]ISNI 0000 0004 0458 8737, GRID grid.224260.0, Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics, , Virginia Commonwealth University, ; P.O. Box 980003, Richmond, VA 23298 USA
                [6 ]ISNI 0000000419368956, GRID grid.168010.e, Graduate School of Education, CERAS 510, , Stanford University, ; Stanford, CA 94305 USA
                Author information
                http://orcid.org/0000-0003-1131-661X
                http://orcid.org/0000-0002-4161-2199
                http://orcid.org/0000-0002-3894-9049
                Article
                11112
                10.1038/s41467-019-11112-0
                6658471
                31346163
                7e3ce6a4-ff82-4d1c-b2a5-0b96caf5e48c
                © The Author(s) 2019

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 30 October 2018
                : 18 June 2019
                Funding
                Funded by: none.
                Categories
                Article
                Custom metadata
                © The Author(s) 2019

                Uncategorized
                genome-wide association studies,genetic variation,predictive markers,risk factors

                Comments

                Comment on this article