25
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Integrating sequencing datasets to form highly confident SNP and indel genotype calls for a whole human genome

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Clinical adoption of human genome sequencing requires methods with known accuracy of genotype calls at millions or billions of positions across a genome. Previous work showing discordance amongst sequencing methods and algorithms has made clear the need for a highly accurate set of genotypes across a whole genome that could be used as a benchmark. We present methods to make highly confident SNP, indel, and homozygous reference genotype calls for NA12878, the pilot genome for the Genome in a Bottle Consortium. We minimize bias towards any method by integrating and arbitrating between 14 datasets from 5 sequencing technologies, 7 mappers, and 3 variant callers. Regions for which no confident genotype call could be made are identified as uncertain, and classified into different reasons for uncertainty. Our highly confident genotype calls are publicly available on the Genome Comparison and Analytic Testing (GCAT) website to enable real-time benchmarking of any method.

          Related collections

          Author and article information

          Journal
          2013-07-17
          2013-12-14
          Article
          10.1038/nbt.2835
          1307.4661
          cddacbca-b041-419e-a9e0-b65d94f2dce7

          http://arxiv.org/licenses/nonexclusive-distrib/1.0/

          History
          Custom metadata
          q-bio.GN

          Genetics
          Genetics

          Comments

          Comment on this article