The Genographic Project is studying the genetic signatures of ancient human migrations and creating an open-source research database. It allows members of the public to participate in a real-time anthropological genetics study by submitting personal samples for analysis and donating the genetic results to the database. We report our experience from the first 18 months of public participation in the Genographic Project, during which we have created the largest standardized human mitochondrial DNA (mtDNA) database ever collected, comprising 78,590 genotypes. Here, we detail our genotyping and quality assurance protocols including direct sequencing of the mtDNA HVS-I, genotyping of 22 coding-region SNPs, and a series of computational quality checks based on phylogenetic principles. This database is very informative with respect to mtDNA phylogeny and mutational dynamics, and its size allows us to develop a nearest neighbor–based methodology for mtDNA haplogroup prediction based on HVS-I motifs that is superior to classic rule-based approaches. We make available to the scientific community and general public two new resources: a periodically updated database comprising all data donated by participants, and the nearest neighbor haplogroup prediction tool.
The Genographic Project was launched in 2005 to address anthropological questions on a global scale using genetics as a tool. Samples are collected in two ways. First, the project comprises a consortium of ten scientific teams from around the world united by a core ethical and scientific framework that is responsible for sample collection and analysis in their respective region. Second, the project promotes public participation in countries around the world and anyone can participate by purchasing a participation kit ( Video S1). The mitochondrial DNA (mtDNA), typed in female participants, is inherited from the mother without recombining, being particularly informative with respect to maternal ancestry. Over the first 18 months of public participation in the project we have built up the largest to date database of mtDNA variants, containing 78,590 entries from around the world. Here, we describe the procedures used to generate, manage, and analyze the genetic data, and the first insights from them. We can understand new aspects of the structure of the mtDNA tree and develop much better ways of classifying mtDNA. We therefore now release this dataset and the new methods we have developed, and will continue to update them as more people join the Genographic Project.