2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Summary

          Publicly available genome-wide association studies (GWAS) summary statistics exhibit uneven quality, which can impact the validity of follow-up analyses. First, we present an overview of possible misspecifications that come with GWAS summary statistics. Then, in both simulations and real-data analyses, we show that additional information such as imputation INFO scores, allele frequencies, and per-variant sample sizes in GWAS summary statistics can be used to detect possible issues and correct for misspecifications in the GWAS summary statistics. One important motivation for us is to improve the predictive performance of polygenic scores built from these summary statistics. Unfortunately, owing to the lack of reporting standards for GWAS summary statistics, this additional information is not systematically reported. We also show that using well-matched linkage disequilibrium (LD) references can improve model fit and translate into more accurate prediction. Finally, we discuss how to make polygenic score methods such as lassosum and LDpred2 more robust to these misspecifications to improve their predictive power.

          Abstract

          This work studies the impact of misspecifications in summary statistics from genome-wide association studies. It provides an overview of such misspecifications, how they can arise, and what negative consequences they can have on follow-up analyses. It also investigates possible corrections, with the main goal of improving polygenic scores.

          Related collections

          Most cited references49

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          A global reference for human genetic variation

          The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Welcome to the Tidyverse

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The UK Biobank resource with deep phenotyping and genomic data

              The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.
                Bookmark

                Author and article information

                Contributors
                Journal
                HGG Adv
                HGG Adv
                Human Genetics and Genomics Advances
                Elsevier
                2666-2477
                18 August 2022
                13 October 2022
                18 August 2022
                : 3
                : 4
                : 100136
                Affiliations
                [1 ]National Centre for Register-Based Research, Aarhus University, 8210 Aarhus, Denmark
                [2 ]Université Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
                [3 ]Department of Computational Biology, Institut Pasteur, Université Paris Cité, 75015 Paris, France
                [4 ]Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
                [5 ]Bioinformatics Research Centre, Aarhus University, 8000 Aarhus, Denmark
                Author notes
                []Corresponding author florian.prive.21@ 123456gmail.com
                [6]

                Lead contact

                Article
                S2666-2477(22)00052-5 100136
                10.1016/j.xhgg.2022.100136
                9465343
                36105883
                3ece94d6-30fb-4cdb-b61a-2b070bb8e252
                © 2022 The Author(s)

                This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

                History
                : 21 April 2022
                : 11 August 2022
                Categories
                Article

                gwas summary statistics,misspecifications,polygenic scores

                Comments

                Comment on this article