There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.
Abstract
Clinical adoption of human genome sequencing requires methods with known accuracy
of genotype calls at millions or billions of positions across a genome. Previous work
showing discordance amongst sequencing methods and algorithms has made clear the need
for a highly accurate set of genotypes across a whole genome that could be used as
a benchmark. We present methods to make highly confident SNP, indel, and homozygous
reference genotype calls for NA12878, the pilot genome for the Genome in a Bottle
Consortium. We minimize bias towards any method by integrating and arbitrating between
14 datasets from 5 sequencing technologies, 7 mappers, and 3 variant callers. Regions
for which no confident genotype call could be made are identified as uncertain, and
classified into different reasons for uncertainty. Our highly confident genotype calls
are publicly available on the Genome Comparison and Analytic Testing (GCAT) website
to enable real-time benchmarking of any method.