Genome-wide association studies, which typically report regression coefficients summarizing
the associations of many genetic variants with various traits, are potentially a powerful
source of data for Mendelian randomization investigations. We demonstrate how such
coefficients from multiple variants can be combined in a Mendelian randomization analysis
to estimate the causal effect of a risk factor on an outcome. The bias and efficiency
of estimates based on summarized data are compared to those based on individual-level
data in simulation studies. We investigate the impact of gene–gene interactions, linkage
disequilibrium, and ‘weak instruments’ on these estimates. Both an inverse-variance
weighted average of variant-specific associations and a likelihood-based approach
for summarized data give similar estimates and precision to the two-stage least squares
method for individual-level data, even when there are gene–gene interactions. However,
these summarized data methods overstate precision when variants are in linkage disequilibrium.
If the
P-value in a linear regression of the risk factor for each variant is less than
, then weak instrument bias will be small. We use these methods to estimate the causal
association of low-density lipoprotein cholesterol (LDL-C) on coronary artery disease
using published data on five genetic variants. A 30% reduction in LDL-C is estimated
to reduce coronary artery disease risk by 67% (95% CI: 54% to 76%). We conclude that
Mendelian randomization investigations using summarized data from uncorrelated variants
are similarly efficient to those using individual-level data, although the necessary
assumptions cannot be so fully assessed.