Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Here we conducted a large-scale genetic association analysis of educational attainment in a sample of approximately 1.1 million individuals and identify 1,271 independent genome-wide-significant SNPs. For the SNPs taken together, we found evidence of heterogeneous effects across environments. The SNPs implicate genes involved in brain-development processes and neuron-to-neuron communication. In a separate analysis of the X chromosome, we identify 10 independent genome-wide-significant SNPs and estimate a SNP heritability of around 0.3% in both men and women, consistent with partial dosage compensation. A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11-13% of the variance in educational attainment and 7-10% of the variance in cognitive performance. This prediction accuracy substantially increases the utility of polygenic scores as tools in research.

Related collections

Most cited references 19

Record: found
Abstract: found
Article: not found

The nature of nurture: Effects of parental genotypes

Augustine Kong, Gudmar Thorleifsson, Michael L. Frigge … (2018)

Sequence variants in the parental genomes that are not transmitted to a child (the proband) are often ignored in genetic studies. Here we show that nontransmitted alleles can affect a child through their impacts on the parents and other relatives, a phenomenon we call "genetic nurture." Using results from a meta-analysis of educational attainment, we find that the polygenic score computed for the nontransmitted alleles of 21,637 probands with at least one parent genotyped has an estimated effect on the educational attainment of the proband that is 29.9% (P = 1.6 × 10-14) of that of the transmitted polygenic score. Genetic nurturing effects of this polygenic score extend to other traits. Paternal and maternal polygenic scores have similar effects on educational attainment, but mothers contribute more than fathers to nutrition- and heath-related traits.

0 comments Cited 329 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

GWAS of 126,559 individuals identifies genetic variants associated with educational attainment.

Cornelius Rietveld, Sarah E. Medland, Jaime Derringer … (2013)

A genome-wide association study (GWAS) of educational attainment was conducted in a discovery sample of 101,069 individuals and a replication sample of 25,490. Three independent single-nucleotide polymorphisms (SNPs) are genome-wide significant (rs9320913, rs11584700, rs4851266), and all three replicate. Estimated effects sizes are small (coefficient of determination R(2) ≈ 0.02%), approximately 1 month of schooling per allele. A linear polygenic score from all measured SNPs accounts for ≈2% of the variance in both educational attainment and cognitive function. Genes in the region of the loci have previously been associated with health, cognitive, and central nervous system phenotypes, and bioinformatics analyses suggest the involvement of the anterior caudate nucleus. These findings provide promising candidate SNPs for follow-up work, and our effect size estimates can anchor power analyses in social-science genetics.

0 comments Cited 301 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Genome-wide association meta-analysis of 78,308 individuals identifies new loci and genes influencing human intelligence

Suzanne Sniekers, Sven Stringer, Kyoko Watanabe … (2017)

Intelligence is associated with important economic and health-related life outcomes 1 . Despite substantial heritability 2 (0.54) and confirmed polygenic nature, initial genetic studies were mostly underpowered 3–5 . Here we report a meta-analysis for intelligence of 78,308 individuals. We identify 336 single nucleotide polymorphisms (SNPs) (METAL P 500,000 participants. All participants provided written informed consent; the UK Biobank received ethical approval from the National Research Ethics Service Committee North West–Haydock (reference 11/NW/0382), and all study procedures were performed in accordance with the World Medical Association Declaration of Helsinki ethical principles for medical research. The current study was conducted under the UK Biobank application number 16406. The study design of the UK Biobank has been described in detail elsewhere 35,36 . Briefly, invitation letters were sent out in 2006–2010 to ~9.2 million individuals including all people aged 40–69 years who were registered with the National Health Service and living up to ~25 miles from one of the 22 study assessment centers. A total of 503,325 participants were subsequently recruited into the study 35 . Apart from registry based phenotypic information, extensive self-reported baseline data have been collected by questionnaire, in addition to anthropometric assessments and DNA collection. For the present study we used imputed data obtained from UK Biobank (May 2015 release) including ~73 million genetic variants in 152,249 individuals. Details on the data are provided elsewhere (see URLs). In summary, the first ~50,000 samples were genotyped on the UK BiLEVE Axiom array, and the remaining ~100,000 samples were genotyped on the UK Biobank Axiom array. After standard quality control of the SNPs and samples, which was centrally performed by UK Biobank, the dataset comprised 641,018 autosomal SNPs in 152,256 samples for phasing and imputation. Imputation was performed with a reference panel that included the UK10K haplotype panel and the 1000 Genomes Project Phase 3 reference panel. We used two fluid intelligence phenotypes from the Biobank data set. These are based on questionnaires that were taken either in the assessment center at the initial intake (‘touchscreen’, field 20016) or at a later moment at home (‘web-based’, field 20191). The measures indicate the number of correct answers out of 13 fluid intelligence questions. The data distribution roughly approximates a normal distribution. For the analyses in our study, we only included individuals of Caucasian descent. After removal of related individuals, discordant sex, withdrawn consent, and missing phenotype data, 36,257 individuals remained for analysis for the fluid intelligence touchscreen measure and 28,846 for the web-based version. As 10,984 individuals had taken both the touchscreen and the web-based test, we only included the data from the touchscreen test for these individuals. This resulted in 54,119 individuals with a score on either the fluid intelligence web-based (UKB-wb) or touchscreen (UKB-ts) version (Supplementary Table 1). At the time of taking the test, participants’ ages ranged between 40 and 78. Half of the participants were between 40 and 60 years old, 44% between 60 and 70 and 6% were older than 70. The mean age was 58.98 with a standard deviation of 8.19. Summary statistics from CHIC consortium We downloaded the publicly available combined GWAS results from the meta-analyses as reported by CHIC 5 (see URLs). Details on the included cohorts and performed analyses are reported in the original publication 5 . Briefly, CHIC includes 6 cohorts totaling 12,441 individuals: the Avon Longitudinal Study of Parents and Children (ALSPAC, N = 5,517), the Lothian Birth Cohorts of 1921 and 1936 (LBC1921, N = 464; LBC1936, N = 947), the Brisbane Adolescent Twin Study subsample of Queensland Institute of Medical Research (QIMR, N = 1,752), the Western Australian Pregnancy Cohort Study (Raine, N = 936), and the Twins Early Development Study (TEDS, N = 2,825). All individuals are children aged between 6–18 years. Within each cohort the cognitive performance measure was adjusted for sex and age and principal components were included to adjust for population stratification. See also Supplementary Table 1. Full GWAS data from additional cohorts We used the same additional (non-CHIC) cohorts as described in detail in ref. 7 , which included 11,748 individuals from 5 cohorts. In ref. 7 , results were only reported for 69 SNPs, as these served as a secondary analysis for a look-up effort. In the current study we use the full genome-wide results from these cohorts. GWAS were conducted in 2013 and summary statistics were obtained from the PIs of the 5 cohorts. The quality control protocol entailed excluding SNPs with MAF 0.01. Positional annotations for all lead SNPs and SNPs in LD with the lead SNPs were obtained by performing ANNOVAR gene-based annotation using refSeq genes. In addition, CADD scores 38 , and RegulomeDB 15 scores were annotated to SNPs by matching chromosome, position, reference and alternative alleles. For each SNP eQTLs were extracted from GTEx (44 tissue types) 39 , Blood eQTL browser 40 and BIOS gene-level eQTLs 41 . The eQTLs obtained from GTEx were filtered on gene P-value < 0.05 and eQTLs obtained from the other two databases were filtered on FDR < 0.05. The FDR values were provided by GTEx, BIOS and Blood eQTL browser. For GTEx eQTLs, there is one FDR value available per gene-tissue pair. As such, the FDR is identical for all eQTLs belonging to the same gene-tissue pair. For BIOS and Blood eQTL browser, an FDR value was computed per SNP. To test whether the SNPs were functionally active by means of histone modifications, we obtained epigenetic data from the NIH Roadmap Epigenomics Mapping Consortium 42 and ENCODE 43 . For every 200bp of the genome a 15-core chromatin state was predicted by a Hidden Markov Model based on 5 histone marks (i.e. H3K4me3, H3K4me1, H3K27me3, H3K9me3, and H3K36me3) for 127 tissue/cell types 44 . We annotated chromatin states (15 states in total) to SNPs by matching chromosome and position for every tissue/cell type. We computed the minimum state (1: the most active state) and the consensus state (majority of states) across 127 tissue/cell types for each SNP. Chromatin states were also determined for the 52 genes (47 from the gene-based test + 5 additional genes implicated by single SNP GWAS). For each gene and tissue, the chromatin state was obtained per 200 bp interval in the gene. We then annotated the genes by means of a consensus decision when multiple states were present for a single gene; i.e. the state of the gene was defined as the modus of all states present in the gene. Tissue expression of genes RNA sequencing data of 1,641 tissue samples with 45 unique tissue labels was derived from the GTEx consortium 39 . This set includes 313 brain samples over 13 unique brain regions (see Supplementary Table 18 for sample size per tissue). Of the 52 genes implicated by either the GWAS or the GWGWAS, 44 were included in the GTEx data. Normalization of the data was performed as described previously 45 . Briefly, genes with RPKM (Reads Per Kilobase Million) value smaller than 0.1 in at least 80% of the samples were removed. The remaining genes were log2 transformed (after using a pseudocount of 1), and finally a zero-mean normalization was applied. Proxy-replication in educational attainment For the replication analysis we used a subset of the data from ref. 21. In particular, we excluded the Erasmus Rucphen Family, the Minnesota Center for Twin and Family Research Study, the Swedish Twin Registry Study, the 23andMe data and all individuals from UK Biobank, to make sure there was no sample overlap with our IQ dataset. Genetic correlation between intelligence and EA in this non-overlapping subsample was rg=0.73, SE=0.03, P=1.4×10−163. The replication analysis was based on the phenotype EduYears, which measures the number of years of schooling completed. A total of 306 out of our 336 top SNPs (and 16 out of 18 independent lead SNPs) was available in the educational attainment sample. We performed a sign concordance analysis for the 16 independent lead SNPs, using the exact binomial test. For each independent signal we determined whether either the lead SNP had a P-value smaller than 0.05/16 in the educational attainment analysis, or another (correlated) top SNP in the same locus if this was not the case. All 47 genes implicated in the GWGAS for intelligence were available for look-up in the EA sample. For each gene we determined whether it had a P-value smaller than 0.05/47 in the EA analysis. Polygenic Risk Score analysis We used LDpred 16 to calculate the variance explained in intelligence in independent samples by a polygenic risk score based on our discovery analysis, as well as based on two previous GWAS studies for intelligence 5,6 . LDpred adjusts GWAS summary statistics for the effects of linkage disequilibrium (LD) by using an approximate Gibbs sampler that calculates posterior means of effects, conditional on LD information, when calculating polygenic risk scores. We used varying priors for the fraction of SNPs with non-zero effects (prior: 0.01, 0.05, 0.1, 0.5, 1, and an infinitesimal prior). Independent datasets available for PRS analyses are described in the Supplementary Note. Supplementary Material 1 2 3 4

0 comments Cited 166 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Title: Nature Genetics

Abbreviated Title: Nat Genet

Publisher: Springer Nature America, Inc

ISSN (Print): 1061-4036

ISSN (Electronic): 1546-1718

Publication date Created: August 2018

Publication date (Electronic): July 23 2018

Publication date (Print): August 2018

Volume: 50

Issue: 8

Pages: 1112-1121

Article

DOI: 10.1038/s41588-018-0147-3

PMC ID: 6393768

PubMed ID: 30038396

SO-VID: a49fb2bb-f5c8-43f8-aa43-f2d9da77ffd4

License:

http://www.springer.com/tdm

History

Data availability:

Comments

Comment on this article

scite_

Cited by 787

See all cited by

- Version 1
- Version 1

Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals

Read this article at

Abstract

Related collections

Genome Integrity

Most cited references 19

The nature of nurture: Effects of parental genotypes

GWAS of 126,559 individuals identifies genetic variants associated with educational attainment.

Genome-wide association meta-analysis of 78,308 individuals identifies new loci and genes influencing human intelligence

Author and article information

Journal

Article

History

Comments

Comment on this article

Similar content 1,246

Cited by 787