Is population structure in the genetic biobank era irrelevant, a challenge, or an opportunity?

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Replicable genetic association signals have consistently been found through genome-wide association studies in recent years. The recent dramatic expansion of study sizes improves power of estimation of effect sizes, genomic prediction, causal inference, and polygenic selection, but it simultaneously increases susceptibility of these methods to bias due to subtle population structure. Standard methods using genetic principal components to correct for structure might not always be appropriate and we use a simulation study to illustrate when correction might be ineffective for avoiding biases. New methods such as trans-ethnic modeling and chromosome painting allow for a richer understanding of the relationship between traits and population structure. We illustrate the arguments using real examples (stroke and educational attainment) and provide a more nuanced understanding of population structure, which is set to be revisited as a critical aspect of future analyses in genetic epidemiology. We also make simple recommendations for how problems can be avoided in the future. Our results have particular importance for the implementation of GWAS meta-analysis, for prediction of traits, and for causal inference.

Related collections

Most cited references 117

Record: found
Abstract: found
Article: found

Is Open Access

A global reference for human genetic variation

Lachlan Coin, Robert Garry, Oleksyk Taras (2018)

The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

0 comments Cited 4486 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age

Cathie Sudlow, John Gallacher, Naomi Allen … (2017)

Cathie Sudlow and colleagues describe the UK Biobank, a large population-based prospective study, established to allow investigation of the genetic and non-genetic determinants of the diseases of middle and old age.

0 comments Cited 3123 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

GCTA: a tool for genome-wide complex trait analysis.

Jian Yang, S. Lee, Michael E Goddard … (2011)

For most human complex diseases and traits, SNPs identified by genome-wide association studies (GWAS) explain only a small fraction of the heritability. Here we report a user-friendly software tool called genome-wide complex trait analysis (GCTA), which was developed based on a method we recently developed to address the "missing heritability" problem. GCTA estimates the variance explained by all the SNPs on a chromosome or on the whole genome for a complex trait rather than testing the association of any particular SNP to the trait. We introduce GCTA's five main functions: data management, estimation of the genetic relationships from SNPs, mixed linear model analysis of variance explained by the SNPs, estimation of the linkage disequilibrium structure, and GWAS simulation. We focus on the function of estimating the variance explained by all the SNPs on the X chromosome and testing the hypotheses of dosage compensation. The GCTA software is a versatile tool to estimate and partition complex trait variation with large GWAS data sets.

0 comments Cited 1834 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Daniel John Lawson:

ORCID: http://orcid.org/0000-0002-5311-6213

dan.lawson@bristol.ac.uk

Journal

Journal ID (nlm-ta): Hum Genet

Journal ID (iso-abbrev): Hum. Genet

Title: Human Genetics

Publisher: Springer Berlin Heidelberg (Berlin/Heidelberg )

ISSN (Print): 0340-6717

ISSN (Electronic): 1432-1203

Publication date (Electronic): 27 April 2019

Publication date PMC-release: 27 April 2019

Publication date (Print): 2020

Volume: 139

Issue: 1

Pages: 23-41

Affiliations

[1 ]GRID grid.5337.2, ISNI 0000 0004 1936 7603, MRC Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, , University of Bristol, ; Oakfield House, Oakfield Grove, Bristol, BS8 2BN UK

[2 ]GRID grid.83440.3b, ISNI 0000000121901201, Institute of Cardiovascular Science, Faculty of Population Health Sciences, , University College London, ; Gower Street, London, WC1E 6BT UK

Author information

Daniel John Lawson http://orcid.org/0000-0002-5311-6213

Article

Publisher ID: 2014

DOI: 10.1007/s00439-019-02014-8

PMC ID: 6942007

PubMed ID: 31030318

SO-VID: 76e22bfb-5b6c-43be-857a-d2892221e9c3

License:

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

History

Date received : 24 November 2018

Date accepted : 12 April 2019

Funding

Funded by: Wellcome Trust (GB)

Award ID: WT104125MA

Award Recipient : Daniel John Lawson

Funded by: FundRef http://dx.doi.org/10.13039/501100000265, Medical Research Council;

Award ID: MC_UU_00011/1

Custom metadata

ScienceOpen disciplines: Genetics

Data availability:

ScienceOpen disciplines: Genetics

Comments

Comment on this article

scite_

Cited by 35

See all cited by

- Version 1
- Version 1

Is population structure in the genetic biobank era irrelevant, a challenge, or an opportunity?

Read this article at

Abstract

Related collections

UCL: UN SDG 03 Good Health and Well-Being

Most cited references 117

A global reference for human genetic variation

UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age

GCTA: a tool for genome-wide complex trait analysis.

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 105

Cited by 35