Technological and scientific advances, stemming in large part from the Human Genome
and HapMap projects, have made large-scale, genome-wide investigations feasible and
cost effective. These advances have the potential to dramatically impact drug discovery
and development by identifying genetic factors that contribute to variation in disease
risk as well as drug pharmacokinetics, treatment efficacy, and adverse drug reactions.
In spite of the technological advancements, successful application in biomedical research
would be limited without access to suitable sample collections. To facilitate exploratory
genetics research, we have assembled a DNA resource from a large number of subjects
participating in multiple studies throughout the world. This growing resource was
initially genotyped with a commercially available genome-wide 500,000 single-nucleotide
polymorphism panel. This project includes nearly 6,000 subjects of African-American,
East Asian, South Asian, Mexican, and European origin. Seven informative axes of variation
identified via principal-component analysis (PCA) of these data confirm the overall
integrity of the data and highlight important features of the genetic structure of
diverse populations. The potential value of such extensively genotyped collections
is illustrated by selection of genetically matched population controls in a genome-wide
analysis of abacavir-associated hypersensitivity reaction. We find that matching based
on country of origin, identity-by-state distance, and multidimensional PCA do similarly
well to control the type I error rate. The genotype and demographic data from this
reference sample are freely available through the NCBI database of Genotypes and Phenotypes
(dbGaP).