Systematic Discovery of Complex Indels in Human Cancers

Ye, Kai; Wang, Jiayin; Jayasinghe, Reyka G.; Lameijer, Eric-Wubbo; McMichael, Joshua F.; Ning, Xin-jie; McLellan, Michael D.; Xie, Mingchao; Cao, Song; Yellapantula, Venkata; Huang, Kuan-lin; Kamradt-Scott, Adam; Foltz, Steven M.; Niu, Beifang; Johnson, Kimberly J; Moed, Matthijs H.; Slagboom, P. Eline; Chen, Jun-Feng; Wendl, Michael C.; Ding, Li

doi:10.1038/nm.4002

ScienceOpen: research and publishing network

For Publishers

For Researchers

Blog
About

Search
Advanced search

views

recommends

Record: found
Abstract: found
Article: not found

Systematic Discovery of Complex Indels in Human Cancers

research-article

Author(s): Kai Ye ¹ ^, ² , Jiayin Wang ¹ , Reyka Jayasinghe ¹ ^, ³ , Eric-Wubbo Lameijer ⁴ , Joshua F. McMichael ¹ , Jie Ning ¹ , Michael D. McLellan ¹ , Mingchao Xie ¹ ^, ³ , Song Cao ¹ , Venkata Yellapantula ¹ ^, ³ , Kuan-lin Huang ¹ ^, ³ , Adam Scott ¹ ^, ³ , Steven Foltz ¹ ^, ³ , Beifang Niu ¹ , Kimberly J. Johnson ⁵ , Matthijs Moed ⁴ , P. Eline Slagboom ⁴ , Feng Chen ³ ^, ⁶ , Michael C. Wendl ¹ ^, ² ^, ⁷ , Li Ding ¹ ^, ² ^, ³ ^, ⁶

Publication date (Electronic): 14 December 2015

Journal: Nature medicine

Read this article at

ScienceOpenPublisher PMC

Bookmark

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Complex indels are formed by simultaneously deleting and inserting DNA fragments of different sizes at a common genomic location. Here, we present a systematic analysis of somatic complex indels in the coding sequences of over 8,000 cancer cases using Pindel-C. We discovered 285 complex indels in cancer genes (e.g., PIK3R1, TP53, ARID1A, GATA3, and KMT2D) in approximately 3.5% of cases analyzed; nearly all instances of complex indels were overlooked (81.1%) or mis-annotated (17.6%) in 2,199 samples previously reported. In-frame complex indels are enriched in PIK3R1 and EGFR while frameshifts are prevalent in VHL, GATA3, TP53, ARID1A, PTEN, and ATRX. Further, complex indels display strong tissue specificity (e.g., VHL from kidney cancer and GATA3 from breast cancer). Finally, structural analyses support findings of previously missed, but potentially druggable mutations in EGFR, MET, and KIT oncogenes. This study indicates the critical importance of improving complex indel discovery and interpretation in medical research.

Related collections

Most cited references 35

Record: found
Abstract: found
Article: not found

Integrative Genomics Viewer

James Robinson, Helga Thorvaldsdóttir, Wendy Winckler … (2011)

To the Editor Rapid improvements in sequencing and array-based platforms are resulting in a flood of diverse genome-wide data, including data from exome and whole genome sequencing, epigenetic surveys, expression profiling of coding and non-coding RNAs, SNP and copy number profiling, and functional assays. Analysis of these large, diverse datasets holds the promise of a more comprehensive understanding of the genome and its relation to human disease. Experienced and knowledgeable human review is an essential component of this process, complementing computational approaches. This calls for efficient and intuitive visualization tools able to scale to very large datasets and to flexibly integrate multiple data types, including clinical data. However, the sheer volume and scope of data poses a significant challenge to the development of such tools. To address this challenge we developed the Integrative Genomics Viewer (IGV), a lightweight visualization tool that enables intuitive real-time exploration of diverse, large-scale genomic datasets on standard desktop computers. It supports flexible integration of a wide range of genomic data types including aligned sequence reads, mutations, copy number, RNAi screens, gene expression, methylation, and genomic annotations (Figure S1). The IGV makes use of efficient, multi-resolution file formats to enable real-time exploration of arbitrarily large datasets over all resolution scales, while consuming minimal resources on the client computer (see Supplementary Text). Navigation through a dataset is similar to Google Maps, allowing the user to zoom and pan seamlessly across the genome at any level of detail from whole-genome to base pair (Figure S2). Datasets can be loaded from local or remote sources, including cloud-based resources, enabling investigators to view their own genomic datasets alongside publicly available data from, for example, The Cancer Genome Atlas (TCGA) 1 , 1000 Genomes (www.1000genomes.org/), and ENCODE 2 (www.genome.gov/10005107) projects. In addition, IGV allows collaborators to load and share data locally or remotely over the Web. IGV supports concurrent visualization of diverse data types across hundreds, and up to thousands of samples, and correlation of these integrated datasets with clinical and phenotypic variables. A researcher can define arbitrary sample annotations and associate them with data tracks using a simple tab-delimited file format (see Supplementary Text). These might include, for example, sample identifier (used to link different types of data for the same patient or tissue sample), phenotype, outcome, cluster membership, or any other clinical or experimental label. Annotations are displayed as a heatmap but more importantly are used for grouping, sorting, filtering, and overlaying diverse data types to yield a comprehensive picture of the integrated dataset. This is illustrated in Figure 1, a view of copy number, expression, mutation, and clinical data from 202 glioblastoma samples from the TCGA project in a 3 kb region around the EGFR locus 1, 3 . The investigator first grouped samples by tumor subtype, then by data type (copy number and expression), and finally sorted them by median copy number over the EGFR locus. A shared sample identifier links the copy number and expression tracks, maintaining their relative sort order within the subtypes. Mutation data is overlaid on corresponding copy number and expression tracks, based on shared participant identifier annotations. Several trends in the data stand out, such as a strong correlation between copy number and expression and an overrepresentation of EGFR amplified samples in the Classical subtype. IGV’s scalable architecture makes it well suited for genome-wide exploration of next-generation sequencing (NGS) datasets, including both basic aligned read data as well as derived results, such as read coverage. NGS datasets can approach terabytes in size, so careful management of data is necessary to conserve compute resources and to prevent information overload. IGV varies the displayed level of detail according to resolution scale. At very wide views, such as the whole genome, IGV represents NGS data by a simple coverage plot. Coverage data is often useful for assessing overall quality and diagnosing technical issues in sequencing runs (Figure S3), as well as analysis of ChIP-Seq 4 and RNA-Seq 5 experiments (Figures S4 and S5). As the user zooms below the ~50 kb range, individual aligned reads become visible (Figure 2) and putative SNPs are highlighted as allele counts in the coverage plot. Alignment details for each read are available in popup windows (Figures S6 and S7). Zooming further, individual base mismatches become visible, highlighted by color and intensity according to base call and quality. At this level, the investigator may sort reads by base, quality, strand, sample and other attributes to assess the evidence of a variant. This type of visual inspection can be an efficient and powerful tool for variant call validation, eliminating many false positives and aiding in confirmation of true findings (Figures S6 and S7). Many sequencing protocols produce reads from both ends (“paired ends”) of genomic fragments of known size distribution. IGV uses this information to color-code paired ends if their insert sizes are larger than expected, fall on different chromosomes, or have unexpected pair orientations. Such pairs, when consistent across multiple reads, can be indicative of a genomic rearrangement. When coloring aberrant paired ends, each chromosome is assigned a unique color, so that intra- (same color) and inter- (different color) chromosomal events are readily distinguished (Figures 2 and S8). We note that misalignments, particularly in repeat regions, can also yield unexpected insert sizes, and can be diagnosed with the IGV (Figure S9). There are a number of stand-alone, desktop genome browsers available today 6 including Artemis 7 , EagleView 8 , MapView 9 , Tablet 10 , Savant 11 , Apollo 12 , and the Integrated Genome Browser 13 . Many of them have features that overlap with IGV, particularly for NGS sequence alignment and genome annotation viewing. The Integrated Genome Browser also supports viewing array-based data. See Supplementary Table 1 and Supplementary Text for more detail. IGV focuses on the emerging integrative nature of genomic studies, placing equal emphasis on array-based platforms, such as expression and copy-number arrays, next-generation sequencing, as well as clinical and other sample metadata. Indeed, an important and unique feature of IGV is the ability to view all these different data types together and to use the sample metadata to dynamically group, sort, and filter datasets (Figure 1 above). Another important characteristic of IGV is fast data loading and real-time pan and zoom – at all scales of genome resolution and all dataset sizes, including datasets comprising hundreds of samples. Finally, we have placed great emphasis on the ease of installation and use of IGV, with the goal of making both the viewing and sharing of their data accessible to non-informatics end users. IGV is open source software and freely available at http://www.broadinstitute.org/igv/, including full documentation on use of the software. Supplementary Material 1

0 comments Cited 3373 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Is Open Access

Comprehensive molecular portraits of human breast tumors

Nikolaus Schultz (2013)

Summary We analyzed primary breast cancers by genomic DNA copy number arrays, DNA methylation, exome sequencing, mRNA arrays, microRNA sequencing and reverse phase protein arrays. Our ability to integrate information across platforms provided key insights into previously-defined gene expression subtypes and demonstrated the existence of four main breast cancer classes when combining data from five platforms, each of which shows significant molecular heterogeneity. Somatic mutations in only three genes (TP53, PIK3CA and GATA3) occurred at > 10% incidence across all breast cancers; however, there were numerous subtype-associated and novel gene mutations including the enrichment of specific mutations in GATA3, PIK3CA and MAP3K1 with the Luminal A subtype. We identified two novel protein expression-defined subgroups, possibly contributed by stromal/microenvironmental elements, and integrated analyses identified specific signaling pathways dominant in each molecular subtype including a HER2/p-HER2/HER1/p-HER1 signature within the HER2-Enriched expression subtype. Comparison of Basal-like breast tumors with high-grade Serous Ovarian tumors showed many molecular commonalities, suggesting a related etiology and similar therapeutic opportunities. The biologic finding of the four main breast cancer subtypes caused by different subsets of genetic and epigenetic abnormalities raises the hypothesis that much of the clinically observable plasticity and heterogeneity occurs within, and not across, these major biologic subtypes of breast cancer.

0 comments Cited 2050 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Comprehensive Molecular Characterization of Human Colon and Rectal Cancer

Nikolaus Schultz (2014)

Summary To characterize somatic alterations in colorectal carcinoma (CRC), we conducted genome-scale analysis of 276 samples, analyzing exome sequence, DNA copy number, promoter methylation, mRNA and microRNA expression. A subset (97) underwent low-depth-of-coverage whole-genome sequencing. 16% of CRC have hypermutation, three quarters of which have the expected high microsatellite instability (MSI), usually with hypermethylation and MLH1 silencing, but one quarter has somatic mismatch repair gene mutations. Excluding hypermutated cancers, colon and rectum cancers have remarkably similar patterns of genomic alteration. Twenty-four genes are significantly mutated. In addition to the expected APC, TP53, SMAD4, PIK3CA and KRAS mutations, we found frequent mutations in ARID1A, SOX9, and FAM123B/WTX. Recurrent copy number alterations include potentially drug-targetable amplifications of ERBB2 and newly discovered amplification of IGF2. Recurrent chromosomal translocations include fusion of NAV2 and WNT pathway member TCF7L1. Integrative analyses suggest new markers for aggressive CRC and important role for MYC-directed transcriptional activation and repression.

0 comments Cited 1487 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-journal-id): 9502015

Journal ID (pubmed-jr-id): 8791

Journal ID (nlm-ta): Nat Med

Journal ID (iso-abbrev): Nat. Med.

Title: Nature medicine

ISSN (Print): 1078-8956

ISSN (Electronic): 1546-170X

Publication date Nihms-submitted: 25 August 2016

Publication date (Electronic): 14 December 2015

Publication date (Print): January 2016

Publication date PMC-release: 30 August 2016

Volume: 22

Issue: 1

Pages: 97-104

Affiliations

[1 ]McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, USA

[2 ]Department of Genetics, Washington University in St. Louis, St. Louis, MO, USA

[3 ]Department of Medicine, Washington University in St. Louis, St. Louis, MO, USA

[4 ]Leiden University Medical Center, Leiden, the Netherlands

[5 ]Brown School Master of Public Health Program, Washington University in St. Louis, St. Louis, MO, USA

[6 ]Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, USA

[7 ]Department of Mathematics, Washington University in St. Louis, St. Louis, MO, USA

Author notes

[# ]Corresponding Author: Li Ding, Ph.D, lding@ 123456genome.wustl.edu

Article

Manuscript ID: NIHMS735489

DOI: 10.1038/nm.4002

PMC ID: 5003782

PubMed ID: 26657142

SO-VID: 33d071a4-d0e0-44c0-a76b-84eb2c4f6cf8

License:

Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms

Systematic Discovery of Complex Indels in Human Cancers

Read this article at

Abstract

Related collections

AIP Publishing: Coronavirus

Most cited references 35

Integrative Genomics Viewer

Comprehensive molecular portraits of human breast tumors

Comprehensive Molecular Characterization of Human Colon and Rectal Cancer

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 732

Cited by 39

Most referenced authors 2,676