Introduction For many complex traits and diseases, numerous associated single nucleotide polymorphisms (SNPs) have been identified through genome-wide association studies (GWAS)through genome-wide association studies (GWAS) [1]. For many of these identified variants it is still unclear through which mechanism the association between the SNP and the trait or disease phenotype is mediated. A complicating factor is that disease-associated variants might not be the real causal variants, but are in linkage disequilibrium (LD) with the true disease-causing variant, making it difficult to accurately implicate the correct gene for a locus in disease pathogenesis. Within the major histocompatibility locus (MHC) on 6p, many SNPs have been found to be associated with complex diseases such as celiac disease, inflammatory bowel disease, psoriasis, rheumatoid arthritis, diabetes mellitus, schizophrenia, lung cancer and follicular lymphoma [2]–[10]. An analysis of the Catalog of Published Genome-Wide Association Studies [1] revealed that out of 1,167 unique SNP associations with a reported p 0.19, p ]( [ ^_]*)_). For each probe it was determined whether it was mapping uniquely to one particular genomic locus, or, if multiple hits were present whether all these mappings resided in each other vicinity ( 5%). This was different for PCAs 26–50 in the HT12 data: 11 PCs were under substantial genetic control (Figure S9a). We therefore assumed that most trans-eQTLs could be detected when removing approximately 25 PCs. We quantified this systematically, by removing increasing amounts of PCs from the expression data and conducting a full genome-wide trans-eQTL mapping. Indeed, in these analyses at most 244 significant trans-eQTLs could be detected (at FDR 0.05, with potential false-positives due to cross-hybridizations removed), when removing 25 PCs (Figure S9b). The overlap with the expression with no PCs removed was substantial: 62 of the 82 trans-eQTLs (77%), detected in the original analysis were detected as well in the analysis with 25 PCs removed (Figure S9c), all with identical allelic directions (Figure S9d). Identification of false eQTLs due to primer polymorphisms and cross-hybridization One should be aware that sequence polymorphisms can cause many false cis-eQTLs [53]. Such false cis-eQTLs do not reflect actual expression differences caused by sequence polymorphisms in cis-acting factors that affect mRNA levels. Instead they indicate hybridization differences caused by sequence polymorphisms in the mRNA region that is targeted by the microarray expression probes. Therefore, SNP-probe combinations were excluded from the cis-eQTL analysis when the 50 bp long expression probe mapped to a genomic location that contained a known SNP that was showing at least some LD (r2>0.1) with the cis-SNP. We used SNP data from the 1000 Genomes Projects, as it contains LD information for 9,633,115 SNPs (April 2009 release, based on 57 CEU samples of European descent). Detected trans-eQTLs might also reflect false-positives, although we initially had attempted to map the expression probes as accurately as possible, by using the aforementioned three different mapping strategies: it is still well possible that some of the identified, putative trans-eQTLs in fact reflect very subtle cross-hybridization (e.g. pertaining to only a small subsequence of the probe). We therefore tried to falsify each of the putative trans-eQTLs by attempting to map each trans-probe into the vicinity of the SNP probe location, by using a highly relaxed mapping approach. All putative Illumina trans-expression probes were mapped using SHRiMP [54], which uses a global alignment approach, to the human reference genome (NCBI 36.3 build). The mapping settings were chosen very loosely to permit the identification of nearly all potential hybridization locations: match score was 10, the mismatch score was 0, the gap open penalty was −250, the gap extension penalty was −100, Smith and Waterman minimum identical alignment threshold was 30.0%, while other SHRiMP parameters were left at default. Using these settings all mappings with a minimum overlap of 15 bases, or with 20 matches with one mismatch, or 30 matches with 2 mismatches, or full-length (50 bp) probe hybridizations with no more than 15 mismatches were accepted. Any trans-eQTL was discarded, if the expression probe had a mapping that was within 2 Mb of the SNP that showed the trans-eQTL effect. Once these potential false-positive trans-eQTLs had been removed from the real, non-permuted data, we repeated the multiple testing correction (again controlling the FDR at 0.05). Using this strategy we observed several instances where only 20 out the 50 bases of a probe sequence mapped in the vicinity of the trans-SNP (data not shown). For these trans-eQTLs the Spearman's rank correlation p was often lower than 10−100, which would imply these SNPs explain over 25% of the total expression variation of the corresponding trans-genes. Given the small amount of trans-eQTLs we detected in total, such effect sizes are quite unlikely and therefore provide circumstantial evidence these indeed reflect cross-hybridization artifacts. We also assessed whether any of the Illumina SNPs that constitute trans-eQTLs might map to a different position than what is reported in dbSNP. As such we mapped the 50 bp Illumina SNP probe sequences to the genome assembly, permitting up to four mismatches per 50 bp SNP probe sequence. We did not observe any SNP that could map (with some mismatches) to the same chromosome of the trans-probe. It is still possible that some of the trans-eQTLs for which we did not find any evidence of cross-hybridization, still are false positives, e.g. by missing some cross-hybridizations due to imperfections in the NCBI v36 assembly we used. Although we have identified numerous occasions where a SNP affects two different probes within the same gene in trans, substantiating the likelihood these trans-eQTLs are real, providing unequivocal evidence that all our reported trans-eQTLs are real is not straightforward. Enrichment analysis of trait-associated SNPs and SNPs located within the HLA region To assess enrichment of trait-associated SNPs, we used a collection of 1,262 unique SNPs from 'A Catalog of Published Genome-Wide Association Studies' (accessed 09 February 2010, and each having at least one reported association p-value 0.05, an HWE exact p-value >0.0001 and call-rate >95%. To ascertain whether these SNPs are more often constituting an eQTL than expected, we used a methodology that is not affected by the following potential confounders: non-even distribution of SNP markers and expression probe markers across the genome, differences in MAF between SNPs and LD structure within the genotype date and correlation between probes in the expression data. Additionally, this methodology is also not confounded by the fact that for certain traits different SNPs in strong LD can have been reported, due to differences in the platforms that were used to identify these loci. We first determined how many unique eQTL SNPs had been identified in the original eQTL mapping (with an FDR<0.05) and how many of these are trait-associated. Subsequently we permuted the expression phenotypes relative to the genotypes (thus keeping the correlation structure within the genotype data and the correlation structure within the expression data intact, yet assigning the genotypes of a sample to the expression data of a randomly chosen sample) and reran the eQTL mapping, sorting all tested eQTLs on highest significance. We then took an equal number of top associated, but permuted, eQTL SNPs and determined how many of these permuted eQTL SNPs are trait-associated. By performing 100 permutations we obtained an empiric distribution of the number of trait-associated SNPs expected by chance. We subsequently fitted a generalized extreme value distribution (EVD, using the EVD add-on package for R), permitting us to estimate realistic enrichment significance estimates (called EVD p throughout the manuscript). For the MHC enrichment analysis the followed procedure was identical, with the difference that we looked for enrichment for SNPs within the MHC, defined as SNPs physically mapping between 20 Mb and 40 Mb on chromosome 6 (NCBI 36 assembly). Trans-eQTL replication datasets Replication of the detected eQTLs was performed in monocytes from 1,490 different samples [45] and in an independent population of 86 morbidly obese individuals that underwent elective bariatric surgery (Department of general surgery, Maastricht University Medical Centre, the Netherlands). Both these datasets also used the same Illumina HumanHT-12 expression platform. For the 1,490 monocyte samples eQTL P-Values summary statistics were available for all monocyte trans-eQTLs with a nominal p<1.0×10−5. We ascertained how many of the trans-eQTLs we had found in our peripheral blood data had a nominal eQTL p<1.0×10−5 in this monocyte dataset. We also assessed trans-eQTLs in four different tissues from the 86 morbidly obese individuals that underwent bariatric surgery. DNA was extracted from blood samples using the Chemagic Magnetic Separation Module 1 (Chemagen) integrated with a Multiprobe II Pipeting robot (PerkinElmer). All samples were genotyped using both Illumina HumanCytoSNP-12 BeadChips and Illumina HumanOmni1-Quad BeadChips (QC was identical as was applied to the peripheral blood samples). We imputed HapMap 2 genotypes using Impute version 2.0. In addition expression profiling was performed for four different tissues for each of these individuals using the Illumina HumanHT-12 arrays. Wedge biopsies of liver, visceral adipose tissue (VAT, omentum majus), subcutaneous adipose tissue (SAT, abdominal), and muscle (musculus rectus abdominis) were taken during surgery. RNA was isolated using the Qiagen Lipid Tissue Mini Kit (Qiagen, UK, 74804). Assessment of RNA quality and concentration was done with an Agilent Bioanalyzer (Agilent Technologies USA). Starting with 200 ng of RNA, the Ambion Illumina TotalPrep Amplification Kit was used for anti-sense RNA synthesis, amplification, and purification according to the protocol provided by the manufacturer (Ambion, USA). 750 ng of complementary RNA was hybridized to Illumina HumanHT12 BeadChips and scanned on the Illumina BeadArray Reader. Expression data preprocessing was as mentioned before. We first attempted to replicate the trait-associated trans-eQTLs per tissue, using an FDR of 0.05 and 100 permutations. Subsequently we conducted a meta-analysis, combining the four tissues. Per trans-eQTL we used a weighted Z-method to combine the four individual p-values. However, these four datasets are not independent, as they reflect the same individuals. We resolved this by conducting the permutations in such a way that in every permutation round the samples were permuted in exactly the same way for each of the four tissues. By doing this we retained the correlations that exist between the different tissues per sample, and were able to get a realistic empiric (null-)distribution of expected test-statistics. Convergence analysis Per trait we assessed all the SNPs that have been reported to be associated with that particular trait. We analyzed per trait all possible SNP-pairs. If a pair of SNPs was not in LD (r2<0.001) we assessed whether they affected the same gene in cis or trans. When using the trait-associated cis- and trans-eQTLs that had been identified when controlling the FDR at 0.05, we identified 7 unique pairs of SNPs that caused both the same phenotype and also affected the same gene(s). When using a somewhat more relaxed set of trans-eQTLs, identified when controlling the FDR at 0.5, we identified 18 unique pairs of SNPs that affect the same downstream gene. We assessed whether these numbers were significantly higher than expected, by using the same strategy that we had used to assess the enrichment of trait-associated SNPs and the HLA; we ran 100 permutations. We kept per permutation the cis-eQTL list as it was, but generated a permuted set of trans-eQTLs, equal in size to the original set of non-permuted trans-eQTLs. This enabled us to determine per permutation round how many unique pairs of SNPs converge on the same gene(s). We subsequently fitted a generalized extreme value distribution, permitting us to estimate realistic enrichment significance estimates. Co-expression between genes, based on HT12 peripheral blood co-expression If a particular SNP is cis- or trans-acting on multiple genes, it is plausible that those genes are biologically related. Co-expression between these genes provides circumstantial evidence this is the case, strengthening the likelihood such cis- and trans-eQTLs are real. We assessed this in the peripheral blood data, by using the expression data of the 1,240 samples, run on the comprehensive HT12 expression platform. As we had removed 25 PCs (to remove physiological, environmental variation, and systematic experimental variation) for the trans-eQTL analyses, we decided to confine co-expression analyses to this expression dataset. As there are 43,202 HT12 probes that we mapped to a known genomic location, 43,202×43,201/2 = 933,184,801 probe-pairs exist. Given 1,240 samples, a Pearson correlation coefficient r≥0.19 corresponds to a p<0.05 when applying stringent Bonferroni correction for these number of probe-pairs. Accession numbers Expression data for both the peripheral blood and the four non-blood datasets have been deposited in GEO with accession numbers GSE20142 (1,240 peripheral blood samples, hybridized to HT12 arrays), GSE20332 (229 peripheral blood samples, hybridized to H8v2 arrays) and GSE22070 (subcutaneous adipose, visceral adipose, muscle and liver samples). Supporting Information Figure S1 Detected cis- and trans-eQTLs in genome-wide analysis. (TIF) Click here for additional data file. Figure S2 Detected cis- and trans-eQTLs for 1,167 trait-associated SNPs. (TIF) Click here for additional data file. Figure S3 Detected cis- and trans-eQTLs per complex trait. Immune-related and hematological associated SNPs often affect gene expression in cis or trans. (TIF) Click here for additional data file. Figure S4 Co-expression distribution between eQTL genes for mean platelet volume and mean corpuscular volume. (PDF) Click here for additional data file. Figure S5 Replication of trans-eQTLs in four non-blood tissues. (TIF) Click here for additional data file. Figure S6 Principal components used as covariates in analyses. (PDF) Click here for additional data file. Figure S7 Effect of removing principal components from expression data on detect ability of cis-eQTLs. (TIF) Click here for additional data file. Figure S8 Significance of detected cis-eQTLs before and after removal of principal components from expression data. (TIF) Click here for additional data file. Figure S9 Effect of removing principal components from expression data on detect ability of trans-eQTLs. (TIF) Click here for additional data file. Table S1 Detected cis-eQTLs (FDR 0.05) for all common SNPs. (XLSX) Click here for additional data file. Table S2 Detected trans-eQTLs (FDR 0.05) for all common SNPs. (XLS) Click here for additional data file. Table S3 Detected cis-eQTLs (FDR 0.05) for 1,167 trait-associated SNPs. (XLS) Click here for additional data file. Table S4 Detected trans-eQTLs (FDR 0.05) for 1,167 trait-associated SNPs. (XLS) Click here for additional data file. Table S5 Detected cis- and trans-eQTLs (FDR 0.05) per complex trait. (XLS) Click here for additional data file. Table S6 Plots of detected trans-eQTLs for 1,167 trait-associated SNPs for each of the seven individual cohorts of samples that make up the total of 1,469 peripheral blood samples. (PDF) Click here for additional data file. Table S7 Detected trans-eQTLs (FDR 0.50) for 1,167 trait-associated SNPs. (XLS) Click here for additional data file. Table S8 Replicated trans-eQTLs in monocyte eQTL dataset. (XLS) Click here for additional data file. Table S9 Characteristics of subcutaneous adipose, visceral adipose, muscle and liver datasets. (XLS) Click here for additional data file. Table S10 Replicated trans-eQTLs in subcutaneous adipose, visceral adipose, muscle and liver datasets. (XLS) Click here for additional data file. Table S11 Characteristics of peripheral blood expression data. (XLS) Click here for additional data file.