131
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Extensive sequencing of seven human genomes to characterize benchmark reference materials

      data-paper
      a , 1 , 1 , 1 , 1 , 1 , 2 , 2 , 2 , 2 , 3 , 3 , 3 , 3 , 3 , 4 , 4 , 4 , 5 , 5 , 5 , 5 , 5 , 5 , 5 , 6 , 6 , 6 , 6 , 7 , 7 , 7 , 8 , 8 , 8 , 8 , 9 , 10 , 10 , 10 , 11 , 11 , 6 , 11 , 11 , 12 , 11 , 12 , 13 , 13 , 13 , 13 , 13 , 13 , 13 , 14 , 14 , 1 , 2
      Scientific Data
      Nature Publishing Group
      Next-generation sequencing, Genomics, Genome assembly algorithms, Standards

      Read this article at

      ScienceOpenPublisherPMC
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.

          Related collections

          Most cited references9

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          SAMBLASTER: fast duplicate marking and structural variant read extraction

          Motivation: Illumina DNA sequencing is now the predominant source of raw genomic data, and data volumes are growing rapidly. Bioinformatic analysis pipelines are having trouble keeping pace. A common bottleneck in such pipelines is the requirement to read, write, sort and compress large BAM files multiple times. Results: We present SAMBLASTER, a tool that reduces the number of times such costly operations are performed. SAMBLASTER is designed to mark duplicates in read-sorted SAM files as a piped post-pass on DNA aligner output before it is compressed to BAM. In addition, it can simultaneously output into separate files the discordant read-pairs and/or split-read mappings used for structural variant calling. As an alignment post-pass, its own runtime overhead is negligible, while dramatically reducing overall pipeline complexity and runtime. As a stand-alone duplicate marking tool, it performs significantly better than PICARD or SAMBAMBA in terms of both speed and memory usage, while achieving nearly identical results. Availability and implementation: SAMBLASTER is open-source C++ code and freely available for download from https://github.com/GregoryFaust/samblaster. Contact: imh4y@virginia.edu
            • Record: found
            • Abstract: found
            • Article: not found

            Highly efficient DNA synthesis by the phage phi 29 DNA polymerase. Symmetrical mode of DNA replication.

            The results presented in this paper indicate that the phi 29 DNA polymerase is the only enzyme required for efficient synthesis of full length phi 29 DNA with the phi 29 terminal protein, the initiation primer, as the only additional protein requirement. Analysis of phi 29 DNA polymerase activity in various in vitro DNA replication systems indicates that two main reasons are responsible for the efficiency of this minimal system: 1) the phi 29 DNA polymerase is highly processive in the absence of any accessory protein; 2) the polymerase itself is able to produce strand displacement coupled to the polymerization process. Using primed M13 DNA as template, the phi 29 DNA polymerase is able to synthesize DNA chains greater than 70 kilobase pairs. Furthermore, conditions that increase the stability of secondary structure in the template do not affect the processivity and strand displacement ability of the enzyme. Thus, the catalytic properties of the phi 29 DNA polymerase are appropriate for a phi 29 DNA replication mechanism involving two replication origins, strand displacement and continuous synthesis of both strands. The enzymology of phi 29 DNA replication would support a symmetrical model of DNA replication.
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Integrating sequencing datasets to form highly confident SNP and indel genotype calls for a whole human genome

              Clinical adoption of human genome sequencing requires methods with known accuracy of genotype calls at millions or billions of positions across a genome. Previous work showing discordance amongst sequencing methods and algorithms has made clear the need for a highly accurate set of genotypes across a whole genome that could be used as a benchmark. We present methods to make highly confident SNP, indel, and homozygous reference genotype calls for NA12878, the pilot genome for the Genome in a Bottle Consortium. We minimize bias towards any method by integrating and arbitrating between 14 datasets from 5 sequencing technologies, 7 mappers, and 3 variant callers. Regions for which no confident genotype call could be made are identified as uncertain, and classified into different reasons for uncertainty. Our highly confident genotype calls are publicly available on the Genome Comparison and Analytic Testing (GCAT) website to enable real-time benchmarking of any method.

                Author and article information

                Journal
                Sci Data
                Sci Data
                Scientific Data
                Nature Publishing Group
                2052-4463
                07 June 2016
                2016
                : 3
                : 160025
                Affiliations
                [1 ]National Institute of Standards and Technology , Gaithersburg, Maryland 20899, USA
                [2 ]Stanford University , Stanford, California 94305, USA
                [3 ]Department of Physiology and Biophysics, the Feil Family Brain and Mind Research Institute, and HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College, Cornell University , New York, New York 10065, USA
                [4 ]Illumina Mission Bay , San Francisco, California 94158, USA
                [5 ]BioNano Genomics , San Diego, California 92121, USA
                [6 ]Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai , New York, New York 10029, USA
                [7 ]Complete Genomics Inc. , Mountain View, California 94043, USA
                [8 ]Thermo Fisher Scientific , South San Francisco, California 94080, USA
                [9 ]Genome Sciences, University of Washington , Seattle, Washington 98105, USA
                [10 ]National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health , 45 Center Drive, Bethesda, Maryland 20892, USA
                [11 ]PersonalGenomes.org , Boston, Massachusetts 02115, USA
                [12 ]Harvard Medical School , Boston, Massachusetts 02115, USA
                [13 ]10X Genomics , Pleasanton, California 94566, USA
                [14 ]Department of Medical Genetics, Oslo University Hospital , Kirkeveien 166, Bygg 25, Oslo 0450, Norway
                Author notes
                [a ] J.Z. (email: jzook@ 123456nist.gov ).
                []

                J.M.Z., M.L.S., and G.I.A.B. designed the overall study. J.M.Z., D.C., J.M., L.V., N.S., Z.W., Y.L., N.A., E.H., E.J., R.S., R.M.T., K.Z., Y.F., M.C., C.X., and M.L.S. wrote the manuscript. J.M.Z., D.C., J.M., L.V., and M.L.S. designed, sequenced, and analyzed the Illumina paired end W.G.S. J.M.Z., D.C., J.M., N.S., A.S., Z.W., and M.L.S. designed, sequenced, and analyzed the Illumina mate pair W.G.S. J.M.Z., D.C., J.M., N.S., A.S., Y.L., F.C., E.J., A.M., and M.L.S. designed, sequenced, and analyzed the Illumina synthetic long read W.G.S. C.E.M., N.A., and E.H. designed, sequenced, and analyzed the Oxford Nanopore W.G.S. K.P., W.S., T.L., M.S., Z.D., A.H., and H.C. designed, sequenced, and analyzed the BioNano mapping. R.M.T., C.C.C., and N.G. designed, sequenced, and analyzed the Complete Genomics W.G.S. K.Z., S.G., F.H., and Y.F. designed, sequenced, and analyzed the Ion Torrent exome sequencing. J.M.Z., J.M., G.D., E.S., R.S., A.B., M.C., and M.L.S. designed, sequenced, and analyzed the PacBio W.G.S. J.M.Z., A.W.Z., M.B., J.B., P.E., G.M.C., M.L.S., and G.I..A.B. designed the process for selecting the genomes from the P.G.P. P.M., S.K.-P., G.S..Y.Z., M.S.-L., H.S.O., and P.A.M. designed, sequenced, and analyzed the 10X Genomics data. Y.S. and K.B.R. designed, sequenced, and analyzed the Illumina W.E.S. data. J.M.Z., S.S., J.T., and C.X. designed and manage the GIAB FTP site and SRA submissions.

                Article
                sdata201625
                10.1038/sdata.2016.25
                4896128
                27271295
                c09d37b1-f944-4d7e-a382-ecd6b6d2968f
                Copyright © 2016, Macmillan Publishers Limited

                This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0 Metadata associated with this Data Descriptor is available at http://www.nature.com/sdata/ and is released under the CC0 waiver to maximize reuse.

                History
                : 09 September 2015
                : 15 March 2016
                Categories
                Data Descriptor

                next-generation sequencing,genomics,genome assembly algorithms,standards

                Comments

                Comment on this article

                Related Documents Log