Genetic variants that are associated with common human diseases do not lead directly to disease, but instead act on intermediate, molecular phenotypes that in turn induce changes in higher-order disease traits. Therefore, identifying the molecular phenotypes that vary in response to changes in DNA and that also associate with changes in disease traits has the potential to provide the functional information required to not only identify and validate the susceptibility genes that are directly affected by changes in DNA, but also to understand the molecular networks in which such genes operate and how changes in these networks lead to changes in disease traits. Toward that end, we profiled more than 39,000 transcripts and we genotyped 782,476 unique single nucleotide polymorphisms (SNPs) in more than 400 human liver samples to characterize the genetic architecture of gene expression in the human liver, a metabolically active tissue that is important in a number of common human diseases, including obesity, diabetes, and atherosclerosis. This genome-wide association study of gene expression resulted in the detection of more than 6,000 associations between SNP genotypes and liver gene expression traits, where many of the corresponding genes identified have already been implicated in a number of human diseases. The utility of these data for elucidating the causes of common human diseases is demonstrated by integrating them with genotypic and expression data from other human and mouse populations. This provides much-needed functional support for the candidate susceptibility genes being identified at a growing number of genetic loci that have been identified as key drivers of disease from genome-wide association studies of disease. By using an integrative genomics approach, we highlight how the gene RPS26 and not ERBB3 is supported by our data as the most likely susceptibility gene for a novel type 1 diabetes locus recently identified in a large-scale, genome-wide association study. We also identify SORT1 and CELSR2 as candidate susceptibility genes for a locus recently associated with coronary artery disease and plasma low-density lipoprotein cholesterol levels in the process.
Genome-wide association studies seek to identify regions of the genome in which changes in DNA in a given population are correlated with disease, drug response, or other phenotypes of interest. However, changes in DNA that associate with traits like common human diseases do not lead directly to disease, but instead act on intermediate, molecular phenotypes that in turn induce changes in the higher-order disease traits. Therefore, identifying molecular phenotypes that vary in response to changes in DNA that also associate with changes in disease traits can provide the functional information necessary to not only identify and validate the susceptibility genes directly affected by changes in DNA, but to understand as well the molecular networks in which such genes operate and how changes in these networks lead to changes in disease traits. To enable this type of approach we profiled the expression levels of 39,280 transcripts and genotyped 782,476 SNPs in 427 human liver samples, identifying thousands of DNA variants that strongly associated with liver gene expression. These relationships were then leveraged by integrating them with genotypic and expression data from other human and mouse populations, leading to the direct identification of candidate susceptibility genes corresponding to genetic loci identified as key drivers of disease. Our analysis is able to provide much needed functional support for these candidate susceptibility genes.
Identifying changes in DNA that associate with changes in gene expression in human tissues elucidates the genetic architecture of gene expression in human populations and enables the direct identification of functionally supported candidate susceptibility genes in genomic regions associated with disease.