69
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Assessing structural variation in a personal genome—towards a human reference diploid genome

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Characterizing large genomic variants is essential to expanding the research and clinical applications of genome sequencing. While multiple data types and methods are available to detect these structural variants (SVs), they remain less characterized than smaller variants because of SV diversity, complexity, and size. These challenges are exacerbated by the experimental and computational demands of SV analysis. Here, we characterize the SV content of a personal genome with Parliament, a publicly available consensus SV-calling infrastructure that merges multiple data types and SV detection methods.

          Results

          We demonstrate Parliament’s efficacy via integrated analyses of data from whole-genome array comparative genomic hybridization, short-read next-generation sequencing, long-read (Pacific BioSciences RSII), long-insert (Illumina Nextera), and whole-genome architecture (BioNano Irys) data from the personal genome of a single subject (HS1011). From this genome, Parliament identified 31,007 genomic loci between 100 bp and 1 Mbp that are inconsistent with the hg19 reference assembly. Of these loci, 9,777 are supported as putative SVs by hybrid local assembly, long-read PacBio data, or multi-source heuristics. These SVs span 59 Mbp of the reference genome (1.8%) and include 3,801 events identified only with long-read data. The HS1011 data and complete Parliament infrastructure, including a BAM-to-SV workflow, are available on the cloud-based service DNAnexus.

          Conclusions

          HS1011 SV analysis reveals the limits and advantages of multiple sequencing technologies, specifically the impact of long-read SV discovery. With the full Parliament infrastructure, the HS1011 data constitute a public resource for novel SV discovery, software calibration, and personal genome structural variation analysis.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s12864-015-1479-3) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references29

          • Record: found
          • Abstract: found
          • Article: not found
          Is Open Access

          The complete genome of an individual by massively parallel DNA sequencing.

          The association of genetic variation with disease and drug response, and improvements in nucleic acid technologies, have given great optimism for the impact of 'genomic medicine'. However, the formidable size of the diploid human genome, approximately 6 gigabases, has prevented the routine application of sequencing methods to deciphering complete individual human genomes. To realize the full potential of genomics for human health, this limitation must be overcome. Here we report the DNA sequence of a diploid genome of a single individual, James D. Watson, sequenced to 7.4-fold redundancy in two months using massively parallel sequencing in picolitre-size reaction vessels. This sequence was completed in two months at approximately one-hundredth of the cost of traditional capillary electrophoresis methods. Comparison of the sequence to the reference genome led to the identification of 3.3 million single nucleotide polymorphisms, of which 10,654 cause amino-acid substitution within the coding sequence. In addition, we accurately identified small-scale (2-40,000 base pair (bp)) insertion and deletion polymorphism as well as copy number variation resulting in the large-scale gain and loss of chromosomal segments ranging from 26,000 to 1.5 million base pairs. Overall, these results agree well with recent results of sequencing of a single individual by traditional methods. However, in addition to being faster and significantly less expensive, this sequencing technology avoids the arbitrary loss of genomic sequences inherent in random shotgun sequencing by bacterial cloning because it amplifies DNA in a cell-free system. As a result, we further demonstrate the acquisition of novel human sequence, including novel genes not previously identified by traditional genomic sequencing. This is the first genome sequenced by next-generation technologies. Therefore it is a pilot for the future challenges of 'personalized genome sequencing'.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Mapping and sequencing of structural variation from eight human genomes.

            Genetic variation among individual humans occurs on many different scales, ranging from gross alterations in the human karyotype to single nucleotide changes. Here we explore variation on an intermediate scale--particularly insertions, deletions and inversions affecting from a few thousand to a few million base pairs. We employed a clone-based method to interrogate this intermediate structural variation in eight individuals of diverse geographic ancestry. Our analysis provides a comprehensive overview of the normal pattern of structural variation present in these genomes, refining the location of 1,695 structural variants. We find that 50% were seen in more than one individual and that nearly half lay outside regions of the genome previously described as structurally variant. We discover 525 new insertion sequences that are not present in the human reference genome and show that many of these are variable in copy number between individuals. Complete sequencing of 261 structural variants reveals considerable locus complexity and provides insights into the different mutational processes that have shaped the human genome. These data provide the first high-resolution sequence map of human structural variation--a standard for genotyping platforms and a prelude to future individual genome sequencing projects.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              DNA sequencing of a cytogenetically normal acute myeloid leukemia genome

              Lay Summary Acute myeloid leukemia is a highly malignant hematopoietic tumor that affects about 13,000 adults yearly in the United States. The treatment of this disease has changed little in the past two decades, since most of the genetic events that initiate the disease remain undiscovered. Whole genome sequencing is now possible at a reasonable cost and timeframe to utilize this approach for unbiased discovery of tumor-specific somatic mutations that alter the protein-coding genes. Here we show the results obtained by sequencing a typical acute myeloid leukemia genome and its matched normal counterpart, obtained from the patient’s skin. We discovered 10 genes with acquired mutations; two were previously described mutations thought to contribute to tumor progression, and 8 were novel mutations present in virtually all tumor cells at presentation and relapse, whose function is not yet known. Our study establishes whole genome sequencing as an unbiased method for discovering initiating mutations in cancer genomes, and for identifying novel genes that may respond to targeted therapies. We used massively parallel sequencing technology to sequence the genomic DNA of tumor and normal skin cells obtained from a patient with a typical presentation of FAB M1 Acute Myeloid Leukemia (AML) with normal cytogenetics. 32.7-fold ‘haploid’ coverage (98 billion bases) was obtained for the tumor genome, and 13.9-fold coverage (41.8 billion bases) was obtained for the normal sample. Of 2,647,695 well-supported Single Nucleotide Variants (SNVs) found in the tumor genome, 2,588,486 (97.7%) also were detected in the patient’s skin genome, limiting the number of variants that required further study. For the purposes of this initial study, we restricted our downstream analysis to the coding sequences of annotated genes: we found only eight heterozygous, non-synonymous somatic SNVs in the entire genome. All were novel, including mutations in protocadherin/cadherin family members (CDH24 and PCLKC), G-protein coupled receptors (GPR123 and EBI2), a protein phosphatase (PTPRT), a potential guanine nucleotide exchange factor (KNDC1), a peptide/drug transporter (SLC15A1), and a glutamate receptor gene (GRINL1B). We also detected previously described, recurrent somatic insertions in the FLT3 and NPM1 genes. Based on deep readcount data, we determined that all of these mutations (except FLT3) were present in nearly all tumor cells at presentation, and again at relapse 11 months later, suggesting that the patient had a single dominant clone containing all of the mutations. These results demonstrate the power of whole genome sequencing to discover novel cancer-associated mutations.
                Bookmark

                Author and article information

                Contributors
                english@bcm.edu
                William.Salerno@bcm.edu
                Oliver.Hampton@bcm.edu
                cgonzagaj@gmail.com
                Shruthi.Ambreth@bcm.edu
                dritter@bcm.edu
                crbeck@bcm.edu
                cdavis@bcm.edu
                dahdouli@bcm.edu
                sma@dnanexus.com
                acarroll@dnanexus.com
                Narayanan.Veeraraghavan@bcm.edu
                jeremy@spiralgenetics.com
                becky@spiralgenetics.com
                ahastie@bionanogenomics.com
                elam@bionanogenomics.com
                Simon.White@bcm.edu
                Pamela.Mishra@bcm.edu
                mwang@bcm.edu
                yhan@bcm.edu
                zhangfeng@fudan.edu.cn
                pawels@bcm.edu
                wheeler@bcm.edu
                jeffrey.reid@regeneron.com
                donnam@bcm.edu
                jr13@bcm.edu
                sabo@bcm.edu
                kworley@bcm.edu
                jlupski@bcm.edu
                Eric.Boerwinkle@uth.tmc.edu
                agibbs@bcm.edu
                Journal
                BMC Genomics
                BMC Genomics
                BMC Genomics
                BioMed Central (London )
                1471-2164
                11 April 2015
                11 April 2015
                2015
                : 16
                : 1
                : 286
                Affiliations
                [ ]Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
                [ ]Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
                [ ]DNAnexus, Mountain View, CA 94040 USA
                [ ]Spiral Genetics Inc, Seattle, WA 98117 USA
                [ ]BioNano Genomics Inc, San Diego, CA 92121 USA
                [ ]Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai, 200438 China
                [ ]Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030 USA
                [ ]Texas Children’s Hospital, Houston, TX 77030 USA
                [ ]Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX 77030 USA
                Article
                1479
                10.1186/s12864-015-1479-3
                4490614
                25886820
                0fe8061e-7ec8-4ab1-9479-a9d61cb8daa7
                © English et al. 2015

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 24 September 2014
                : 23 March 2015
                Categories
                Research Article
                Custom metadata
                © The Author(s) 2015

                Genetics
                structural variation,long-read sequencing,sv software
                Genetics
                structural variation, long-read sequencing, sv software

                Comments

                Comment on this article