Computation for ChIP-seq and RNA-seq studies

Pepke, Shirley; Wold, Barbara J; Mortazavi, Ali

doi:10.1038/nmeth.1371

ScienceOpen: research and publishing network

For Publishers

For Researchers

Blog
About

Search
Advanced search

views

recommends

Record: found
Abstract: not found
Article: not found

Computation for ChIP-seq and RNA-seq studies

Author(s): Shirley Pepke , Barbara Wold , Ali Mortazavi

Publication date (Electronic): October 15 2009

Journal: Nature Methods

Publisher: Springer Science and Business Media LLC

Read this article at

ScienceOpenPublisher PMC

Bookmark

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Genome-wide measurements of protein-DNA interactions and transcriptomes are increasingly done by deep DNA sequencing methods (ChIP-seq and RNA-seq). The power and richness of these counting-based measurements comes at the cost of routinely handling tens to hundreds of millions of reads. Whereas early adopters necessarily developed their own custom computer code to analyze the first ChIP-seq and RNA-seq datasets, a new generation of more sophisticated algorithms and software tools are emerging to assist in the analysis phase of these projects. Here we describe the multilayered analyses of ChIP-seq and RNA-seq datasets, discuss the software packages currently available to perform tasks at each layer and describe some upcoming challenges and features for future analysis tools. We also discuss how software choices and uses are affected by specific aspects of the underlying biology and data structure, including genome size, positional clustering of transcription factor binding sites, transcript discovery and expression quantification.

Related collections

Most cited references 23

Record: found
Abstract: found
Article: not found

The transcriptional landscape of the yeast genome defined by RNA sequencing.

U Nagalakshmi, Z. Wang, K. Waern … (2008)

The identification of untranslated regions, introns, and coding regions within an organism remains challenging. We developed a quantitative sequencing-based method called RNA-Seq for mapping transcribed regions, in which complementary DNA fragments are subjected to high-throughput sequencing and mapped to the genome. We applied RNA-Seq to generate a high-resolution transcriptome map of the yeast genome and demonstrated that most (74.5%) of the nonrepetitive sequence of the yeast genome is transcribed. We confirmed many known and predicted introns and demonstrated that others are not actively used. Alternative initiation codons and upstream open reading frames also were identified for many yeast genes. We also found unexpected 3'-end heterogeneity and the presence of many overlapping genes. These results indicate that the yeast transcriptome is more complex than previously appreciated.

0 comments Cited 963 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A clustering approach for identification of enriched domains from histone modification ChIP-Seq data.

Chongzhi Zang, Dustin E. Schones, Chen Zeng … (2009)

Chromatin states are the key to gene regulation and cell identity. Chromatin immunoprecipitation (ChIP) coupled with high-throughput sequencing (ChIP-Seq) is increasingly being used to map epigenetic states across genomes of diverse species. Chromatin modification profiles are frequently noisy and diffuse, spanning regions ranging from several nucleosomes to large domains of multiple genes. Much of the early work on the identification of ChIP-enriched regions for ChIP-Seq data has focused on identifying localized regions, such as transcription factor binding sites. Bioinformatic tools to identify diffuse domains of ChIP-enriched regions have been lacking. Based on the biological observation that histone modifications tend to cluster to form domains, we present a method that identifies spatial clusters of signals unlikely to appear by chance. This method pools together enrichment information from neighboring nucleosomes to increase sensitivity and specificity. By using genomic-scale analysis, as well as the examination of loci with validated epigenetic states, we demonstrate that this method outperforms existing methods in the identification of ChIP-enriched signals for histone modification profiles. We demonstrate the application of this unbiased method in important issues in ChIP-Seq data analysis, such as data normalization for quantitative comparison of levels of epigenetic modifications across cell types and growth conditions. http://home.gwu.edu/ approximately wpeng/Software.htm. Supplementary data are available at Bioinformatics online.

0 comments Cited 519 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Design and analysis of ChIP-seq experiments for DNA-binding proteins

Peter V. Kharchenko, Michael Tolstorukov, Peter Park (2008)

Recent progress in massively parallel sequencing platforms has allowed for genome-wide measurements of DNA-associated proteins using a combination of chromatin immunoprecipitation and sequencing (ChIP-seq). While a variety of methods exist for analysis of the established microarray alternative (ChIP-chip), few approaches have been described for processing ChIP-seq data. To fill this gap, we propose an analysis pipeline specifically designed to detect protein binding positions with high accuracy. Using three separate datasets, we illustrate new methods for improving tag alignment and correcting for background signals. We also compare sensitivity and spatial precision of several novel and previously described binding detection algorithms. Finally, we analyze the relationship between the depth of sequencing and characteristics of the detected binding positions, and provide a method for estimating the sequencing depth necessary for a desired coverage of protein binding sites.