Integrative Genomics Viewer

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

To the Editor Rapid improvements in sequencing and array-based platforms are resulting in a flood of diverse genome-wide data, including data from exome and whole genome sequencing, epigenetic surveys, expression profiling of coding and non-coding RNAs, SNP and copy number profiling, and functional assays. Analysis of these large, diverse datasets holds the promise of a more comprehensive understanding of the genome and its relation to human disease. Experienced and knowledgeable human review is an essential component of this process, complementing computational approaches. This calls for efficient and intuitive visualization tools able to scale to very large datasets and to flexibly integrate multiple data types, including clinical data. However, the sheer volume and scope of data poses a significant challenge to the development of such tools. To address this challenge we developed the Integrative Genomics Viewer (IGV), a lightweight visualization tool that enables intuitive real-time exploration of diverse, large-scale genomic datasets on standard desktop computers. It supports flexible integration of a wide range of genomic data types including aligned sequence reads, mutations, copy number, RNAi screens, gene expression, methylation, and genomic annotations (Figure S1). The IGV makes use of efficient, multi-resolution file formats to enable real-time exploration of arbitrarily large datasets over all resolution scales, while consuming minimal resources on the client computer (see Supplementary Text). Navigation through a dataset is similar to Google Maps, allowing the user to zoom and pan seamlessly across the genome at any level of detail from whole-genome to base pair (Figure S2). Datasets can be loaded from local or remote sources, including cloud-based resources, enabling investigators to view their own genomic datasets alongside publicly available data from, for example, The Cancer Genome Atlas (TCGA) 1 , 1000 Genomes (www.1000genomes.org/), and ENCODE 2 (www.genome.gov/10005107) projects. In addition, IGV allows collaborators to load and share data locally or remotely over the Web. IGV supports concurrent visualization of diverse data types across hundreds, and up to thousands of samples, and correlation of these integrated datasets with clinical and phenotypic variables. A researcher can define arbitrary sample annotations and associate them with data tracks using a simple tab-delimited file format (see Supplementary Text). These might include, for example, sample identifier (used to link different types of data for the same patient or tissue sample), phenotype, outcome, cluster membership, or any other clinical or experimental label. Annotations are displayed as a heatmap but more importantly are used for grouping, sorting, filtering, and overlaying diverse data types to yield a comprehensive picture of the integrated dataset. This is illustrated in Figure 1, a view of copy number, expression, mutation, and clinical data from 202 glioblastoma samples from the TCGA project in a 3 kb region around the EGFR locus 1, 3 . The investigator first grouped samples by tumor subtype, then by data type (copy number and expression), and finally sorted them by median copy number over the EGFR locus. A shared sample identifier links the copy number and expression tracks, maintaining their relative sort order within the subtypes. Mutation data is overlaid on corresponding copy number and expression tracks, based on shared participant identifier annotations. Several trends in the data stand out, such as a strong correlation between copy number and expression and an overrepresentation of EGFR amplified samples in the Classical subtype. IGV’s scalable architecture makes it well suited for genome-wide exploration of next-generation sequencing (NGS) datasets, including both basic aligned read data as well as derived results, such as read coverage. NGS datasets can approach terabytes in size, so careful management of data is necessary to conserve compute resources and to prevent information overload. IGV varies the displayed level of detail according to resolution scale. At very wide views, such as the whole genome, IGV represents NGS data by a simple coverage plot. Coverage data is often useful for assessing overall quality and diagnosing technical issues in sequencing runs (Figure S3), as well as analysis of ChIP-Seq 4 and RNA-Seq 5 experiments (Figures S4 and S5). As the user zooms below the ~50 kb range, individual aligned reads become visible (Figure 2) and putative SNPs are highlighted as allele counts in the coverage plot. Alignment details for each read are available in popup windows (Figures S6 and S7). Zooming further, individual base mismatches become visible, highlighted by color and intensity according to base call and quality. At this level, the investigator may sort reads by base, quality, strand, sample and other attributes to assess the evidence of a variant. This type of visual inspection can be an efficient and powerful tool for variant call validation, eliminating many false positives and aiding in confirmation of true findings (Figures S6 and S7). Many sequencing protocols produce reads from both ends (“paired ends”) of genomic fragments of known size distribution. IGV uses this information to color-code paired ends if their insert sizes are larger than expected, fall on different chromosomes, or have unexpected pair orientations. Such pairs, when consistent across multiple reads, can be indicative of a genomic rearrangement. When coloring aberrant paired ends, each chromosome is assigned a unique color, so that intra- (same color) and inter- (different color) chromosomal events are readily distinguished (Figures 2 and S8). We note that misalignments, particularly in repeat regions, can also yield unexpected insert sizes, and can be diagnosed with the IGV (Figure S9). There are a number of stand-alone, desktop genome browsers available today 6 including Artemis 7 , EagleView 8 , MapView 9 , Tablet 10 , Savant 11 , Apollo 12 , and the Integrated Genome Browser 13 . Many of them have features that overlap with IGV, particularly for NGS sequence alignment and genome annotation viewing. The Integrated Genome Browser also supports viewing array-based data. See Supplementary Table 1 and Supplementary Text for more detail. IGV focuses on the emerging integrative nature of genomic studies, placing equal emphasis on array-based platforms, such as expression and copy-number arrays, next-generation sequencing, as well as clinical and other sample metadata. Indeed, an important and unique feature of IGV is the ability to view all these different data types together and to use the sample metadata to dynamically group, sort, and filter datasets (Figure 1 above). Another important characteristic of IGV is fast data loading and real-time pan and zoom – at all scales of genome resolution and all dataset sizes, including datasets comprising hundreds of samples. Finally, we have placed great emphasis on the ease of installation and use of IGV, with the goal of making both the viewing and sharing of their data accessible to non-informatics end users. IGV is open source software and freely available at http://www.broadinstitute.org/igv/, including full documentation on use of the software. Supplementary Material 1

Related collections

Most cited references 16

Record: found
Abstract: found
Article: found

Is Open Access

Rapid planetesimal formation in turbulent circumstellar discs

Anders Johansen, Jeffrey S. Oishi, Mordecai-Mark Mac Low … (2007)

The initial stages of planet formation in circumstellar gas discs proceed via dust grains that collide and build up larger and larger bodies (Safronov 1969). How this process continues from metre-sized boulders to kilometre-scale planetesimals is a major unsolved problem (Dominik et al. 2007): boulders stick together poorly (Benz 2000), and spiral into the protostar in a few hundred orbits due to a head wind from the slower rotating gas (Weidenschilling 1977). Gravitational collapse of the solid component has been suggested to overcome this barrier (Safronov 1969, Goldreich & Ward 1973, Youdin & Shu 2002). Even low levels of turbulence, however, inhibit sedimentation of solids to a sufficiently dense midplane layer (Weidenschilling & Cuzzi 1993, Dominik et al. 2007), but turbulence must be present to explain observed gas accretion in protostellar discs (Hartmann 1998). Here we report the discovery of efficient gravitational collapse of boulders in locally overdense regions in the midplane. The boulders concentrate initially in transient high pressures in the turbulent gas (Johansen, Klahr, & Henning 2006), and these concentrations are augmented a further order of magnitude by a streaming instability (Youdin & Goodman 2005, Johansen, Henning, & Klahr 2006, Johansen & Youdin 2007) driven by the relative flow of gas and solids. We find that gravitationally bound clusters form with masses comparable to dwarf planets and containing a distribution of boulder sizes. Gravitational collapse happens much faster than radial drift, offering a possible path to planetesimal formation in accreting circumstellar discs.

0 comments Cited 2503 times – based on 0 reviews

Preprint

     Review now

Bookmark

Record: found
Abstract: found
Article: found

Integrated Genomic Analysis Identifies Clinically Relevant Subtypes of Glioblastoma Characterized by Abnormalities in PDGFRA, IDH1, EGFR, and NF1

Roel Verhaak, Katherine A. Hoadley, Elizabeth Purdom … (2010)

The Cancer Genome Atlas Network recently cataloged recurrent genomic abnormalities in glioblastoma multiforme (GBM). We describe a robust gene expression-based molecular classification of GBM into Proneural, Neural, Classical, and Mesenchymal subtypes and integrate multidimensional genomic data to establish patterns of somatic mutations and DNA copy number. Aberrations and gene expression of EGFR, NF1, and PDGFRA/IDH1 each define the Classical, Mesenchymal, and Proneural subtypes, respectively. Gene signatures of normal brain cell types show a strong relationship between subtypes and different neural lineages. Additionally, response to aggressive therapy differs by subtype, with the greatest benefit in the Classical subtype and no benefit in the Proneural subtype. We provide a framework that unifies transcriptomic and genomic dimensions for GBM molecular stratification with important implications for future studies. Copyright (c) 2010 Elsevier Inc. All rights reserved.

0 comments Cited 1189 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Comprehensive genomic characterization defines human glioblastoma genes and core pathways

(2008)

Human cancer cells typically harbor multiple chromosomal aberrations, nucleotide substitutions and epigenetic modifications that drive malignant transformation. The Cancer Genome Atlas (TCGA) pilot project aims to assess the value of large-scale multidimensional analysis of these molecular characteristics in human cancer and to provide the data rapidly to the research community. Here, we report the interim integrative analysis of DNA copy number, gene expression and DNA methylation aberrations in 206 glioblastomas (GBM), the most common type of adult brain cancer, and nucleotide sequence aberrations in 91 of the 206 GBMs. This analysis provides new insights into the roles of ERBB2, NF1 and TP53, uncovers frequent mutations of the PI3 kinase regulatory subunit gene PIK3R1, and provides a network view of the pathways altered in the development of GBM. Furthermore, integration of mutation, DNA methylation and clinical treatment data reveals a link between MGMT promoter methylation and a hypermutator phenotype consequent to mismatch repair deficiency in treated glioblastomas, an observation with potential clinical implications. Together, these findings establish the feasibility and power of TCGA, demonstrating that it can rapidly expand knowledge of the molecular basis of cancer.

0 comments Cited 878 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-journal-id): 9604648

Journal ID (pubmed-jr-id): 20305

Journal ID (nlm-ta): Nat Biotechnol

Journal ID (iso-abbrev): Nat. Biotechnol.

Title: Nature Biotechnology

ISSN (Print): 1087-0156

ISSN (Electronic): 1546-1696

Publication date Nihms-submitted: 3 April 2012

Publication date (Print): January 2011

Publication date PMC-release: 07 May 2012

Volume: 29

Issue: 1

Pages: 24-26

Affiliations

[1 ]Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts, USA

[2 ]Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA

[3 ]Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA

Author notes

Corresponding authors: Jill P. Mesirov mesirov@ 123456broad.mit.edu and James T. Robinson, jrobinso@ 123456broadinstitute.org

Article

Manuscript ID: nihpa247133

DOI: 10.1038/nbt.1754

PMC ID: 3346182

PubMed ID: 21221095

SO-VID: e5c1993f-3246-428b-b7e5-2b3e8f5da372

History

Funding

Funded by: National Cancer Institute : NCI

Award ID: R21 CA135827-03S1 || CA

Comments

Comment on this article

scite_

Cited by 5,891

See all cited by

Most referenced authors 2,049

See all reference authors

- Version 1
- Version 1

Integrative Genomics Viewer

Read this article at

Abstract

Related collections

Microbial Genomics

Most cited references 16

Rapid planetesimal formation in turbulent circumstellar discs

Integrated Genomic Analysis Identifies Clinically Relevant Subtypes of Glioblastoma Characterized by Abnormalities in PDGFRA, IDH1, EGFR, and NF1

Comprehensive genomic characterization defines human glioblastoma genes and core pathways

Author and article information

Journal

Affiliations

Author notes

Article

History

Funding

Categories

Comments

Comment on this article

Similar content 31

Cited by 5,891

Most referenced authors 2,049