21
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      SvABA: genome-wide detection of structural variants and indels by local assembly

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Structural variants (SVs), including small insertion and deletion variants (indels), are challenging to detect through standard alignment-based variant calling methods. Sequence assembly offers a powerful approach to identifying SVs, but is difficult to apply at scale genome-wide for SV detection due to its computational complexity and the difficulty of extracting SVs from assembly contigs. We describe SvABA, an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements. We evaluated SvABA's performance on the NA12878 human genome and in simulated and real cancer genomes. SvABA demonstrates superior sensitivity and specificity across a large spectrum of SVs and substantially improves detection performance for variants in the 20–300 bp range, compared with existing methods. SvABA also identifies complex somatic rearrangements with chains of short (<1000 bp) templated-sequence insertions copied from distant genomic regions. We applied SvABA to 344 cancer genomes from 11 cancer types and found that short templated-sequence insertions occur in ∼4% of all somatic rearrangements. Finally, we demonstrate that SvABA can identify sites of viral integration and cancer driver alterations containing medium-sized (50–300 bp) SVs.

          Related collections

          Most cited references48

          • Record: found
          • Abstract: found
          • Article: not found

          A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.

          Heng Li (2011)
          Most existing methods for DNA sequence analysis rely on accurate sequences or genotypes. However, in applications of the next-generation sequencing (NGS), accurate genotypes may not be easily obtained (e.g. multi-sample low-coverage sequencing or somatic mutation discovery). These applications press for the development of new methods for analyzing sequence data with uncertainty. We present a statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data without explicit genotyping or linkage-based imputation. On real data, we demonstrate that our method achieves comparable accuracy to alternative methods for estimating site allele count, for inferring allele frequency spectrum and for association mapping. We also highlight the necessity of using symmetric datasets for finding somatic mutations and confirm that for discovering rare events, mismapping is frequently the leading source of errors. http://samtools.sourceforge.net. hengli@broadinstitute.org.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A framework for variation discovery and genotyping using next-generation DNA sequencing data

            Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples

              Detection of somatic point substitutions is a key step in characterizing the cancer genome. Mutations in cancer are rare (0.1–100/Mb) and often occur only in a subset of the sequenced cells, either due to contamination by normal cells or due to tumor heterogeneity. Consequently, mutation calling methods need to be both specific, avoiding false positives, and sensitive to detect clonal and sub-clonal mutations. The decreased sensitivity of existing methods for low allelic fraction mutations highlights the pressing need for improved and systematically evaluated mutation detection methods. Here we present MuTect, a method based on a Bayesian classifier designed to detect somatic mutations with very low allele-fractions, requiring only a few supporting reads, followed by a set of carefully tuned filters that ensure high specificity. We also describe novel benchmarking approaches, which use real sequencing data to evaluate the sensitivity and specificity as a function of sequencing depth, base quality and allelic fraction. Compared with other methods, MuTect has higher sensitivity with similar specificity, especially for mutations with allelic fractions as low as 0.1 and below, making MuTect particularly useful for studying cancer subclones and their evolution in standard exome and genome sequencing data.
                Bookmark

                Author and article information

                Journal
                Genome Res
                Genome Res
                genome
                genome
                GENOME
                Genome Research
                Cold Spring Harbor Laboratory Press
                1088-9051
                1549-5469
                April 2018
                : 28
                : 4
                : 581-591
                Affiliations
                [1 ]The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA;
                [2 ]Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA;
                [3 ]Bioinformatics and Integrative Genomics, Harvard University, Cambridge, Massachusetts 02138, USA;
                [4 ]Harvard Medical School, Boston, Massachusetts 02115, USA;
                [5 ]Seven Bridges Genomics, Cambridge, Massachusetts 02142, USA;
                [6 ]Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom;
                [7 ]The Finsen Laboratory, Rigshospitalet, University of Copenhagen, DK-2200 Copenhagen, Denmark;
                [8 ]Tri-Institutional PhD Program in Computational Biology and Medicine, New York, New York 10065, USA;
                [9 ]New York Genome Center, New York, New York 10013, USA;
                [10 ]Department of Haematology, University of Cambridge, Cambridge CB2 2XY, United Kingdom;
                [11 ]Department of Pathology and Cancer Center, Massachusetts General Hospital, Boston, Massachusetts 02114, USA;
                [12 ]Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA;
                [13 ]Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA;
                [14 ]Department of Pathology and Laboratory Medicine, Englander Institute for Precision Medicine, Institute for Computational Biomedicine, and Meyer Cancer Center, Weill Cornell Medicine, New York, New York 10065, USA
                Author notes
                [15]

                Co-senior authors.

                Author information
                http://orcid.org/0000-0001-6591-1620
                http://orcid.org/0000-0002-7836-4379
                http://orcid.org/0000-0003-2245-9552
                http://orcid.org/0000-0002-6819-5647
                http://orcid.org/0000-0001-9501-391X
                http://orcid.org/0000-0002-3917-5524
                http://orcid.org/0000-0002-5140-6639
                http://orcid.org/0000-0002-0936-0753
                http://orcid.org/0000-0002-9133-8108
                http://orcid.org/0000-0001-8825-7158
                http://orcid.org/0000-0002-2211-4741
                http://orcid.org/0000-0001-6303-3609
                Article
                9509184
                10.1101/gr.221028.117
                5880247
                29535149
                312615ee-8b87-4f76-9c66-ea12098ff7bb
                © 2018 Wala et al.; Published by Cold Spring Harbor Laboratory Press

                This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

                History
                : 1 February 2017
                : 14 February 2018
                Page count
                Pages: 11
                Funding
                Funded by: National Institutes of Health , open-funder-registry 10.13039/100000002;
                Award ID: T32 HG002295/HG/NHGRI
                Award ID: U54CA143798
                Award ID: R01CA188228
                Funded by: DFCI-Novartis Drug Discovery Program
                Funded by: Voices Against Brain Cancer , open-funder-registry 10.13039/100004196;
                Funded by: Pediatric Low-Grade Astrocytoma Foundation , open-funder-registry 10.13039/100006155;
                Funded by: Broad Institute
                Funded by: Cure Starts Now Foundation , open-funder-registry 10.13039/100008221;
                Funded by: Burroughs Wellcome Fund , open-funder-registry 10.13039/100000861;
                Categories
                Method

                Comments

                Comment on this article