ChIP-Enrich: gene set enrichment testing for ChIP-seq data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Gene set enrichment testing can enhance the biological interpretation of ChIP-seq data. Here, we develop a method, ChIP-Enrich, for this analysis which empirically adjusts for gene locus length (the length of the gene body and its surrounding non-coding sequence). Adjustment for gene locus length is necessary because it is often positively associated with the presence of one or more peaks and because many biologically defined gene sets have an excess of genes with longer or shorter gene locus lengths. Unlike alternative methods, ChIP-Enrich can account for the wide range of gene locus length-to-peak presence relationships (observed in ENCODE ChIP-seq data sets). We show that ChIP-Enrich has a well-calibrated type I error rate using permuted ENCODE ChIP-seq data sets; in contrast, two commonly used gene set enrichment methods, Fisher's exact test and the binomial test implemented in Genomic Regions Enrichment of Annotations Tool (GREAT), can have highly inflated type I error rates and biases in ranking. We identify DNA-binding proteins, including CTCF, JunD and glucocorticoid receptor α (GRα), that show different enrichment patterns for peaks closer to versus further from transcription start sites. We also identify known and potential new biological functions of GRα. ChIP-Enrich is available as a web interface ( http://chip-enrich.med.umich.edu) and Bioconductor package.

Related collections

Most cited references 28

Record: found
Abstract: not found
Article: not found

Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Yoav Benjamini, Yosef Hochberg (1995)

0 comments Cited 23779 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

DAVID: Database for Annotation, Visualization, and Integrated Discovery.

Glynn Dennis, Brad T. Sherman, Douglas A Hosack … (2003)

Functional annotation of differentially expressed genes is a necessary and critical step in the analysis of microarray data. The distributed nature of biological knowledge frequently requires researchers to navigate through numerous web-accessible databases gathering information one gene at a time. A more judicious approach is to provide query-based access to an integrated database that disseminates biologically rich information across large datasets and displays graphic summaries of functional information. Database for Annotation, Visualization, and Integrated Discovery (DAVID; http://www.david.niaid.nih.gov) addresses this need via four web-based analysis modules: 1) Annotation Tool - rapidly appends descriptive data from several public databases to lists of genes; 2) GoCharts - assigns genes to Gene Ontology functional categories based on user selected classifications and term specificity level; 3) KeggCharts - assigns genes to KEGG metabolic processes and enables users to view genes in the context of biochemical pathway maps; and 4) DomainCharts - groups genes according to PFAM conserved protein domains. Analysis results and graphical displays remain dynamically linked to primary data and external data repositories, thereby furnishing in-depth as well as broad-based data coverage. The functionality provided by DAVID accelerates the analysis of genome-scale datasets by facilitating the transition from data collection to biological meaning.

0 comments Cited 1427 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs

Mitchell Guttman, Manuel Garber, Joshua Levin … (2010)

RNA-Seq provides an unbiased way to study a transcriptome, including both coding and non-coding genes. To date, most RNA-Seq studies have critically depended on existing annotations, and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We apply it to mouse embryonic stem cells, neuronal precursor cells, and lung fibroblasts to accurately reconstruct the full-length gene structures for the vast majority of known expressed genes. We identify substantial variation in protein-coding genes, including thousands of novel 5′-start sites, 3′-ends, and internal coding exons. We then determine the gene structures of over a thousand lincRNA and antisense loci. Our results open the way to direct experimental manipulation of thousands of non-coding RNAs, and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes.

0 comments Cited 499 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Nucleic Acids Res

Journal ID (iso-abbrev): Nucleic Acids Res

Journal ID (hwp): nar

Journal ID (publisher-id): nar

Title: Nucleic Acids Research

Publisher: Oxford University Press

ISSN (Print): 0305-1048

ISSN (Electronic): 1362-4962

Publication date (Print): 1 September 2014

Publication date (Electronic): 30 May 2014

Publication date PMC-release: 30 May 2014

Volume: 42

Issue: 13

Page: e105

Affiliations

[1 ]Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA

[2 ]Biostatistics Department, University of Michigan, Ann Arbor, MI 48109, USA

[3 ]Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA

Author notes

[* ]To whom correspondence should be addressed. Tel: +1 734 763 8013; Fax: +1 734 615 6553; Email: sartorma@ 123456umich.edu

Correspondence may also be addressed to Laura J. Scott. Tel: +1 734 763 0006; Fax: +1 734 763 2215; Email: ljst@ 123456umich.edu

[†]

The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.

Article

DOI: 10.1093/nar/gku463

PMC ID: 4117744

PubMed ID: 24878920

SO-VID: 253185e7-0d8c-4b2e-bb9a-6d3aa658a69b

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date accepted : 9 May 2014

Date revision received : 7 May 2014

Date received : 23 January 2014

Page count

Pages: 13

Custom metadata

cover-date 29 July 2014

ScienceOpen disciplines: Genetics

Data availability:

ScienceOpen disciplines: Genetics

Comments

Comment on this article

scite_

Cited by 69

See all cited by

Most referenced authors 2,051

See all reference authors

- Version 1

ChIP-Enrich: gene set enrichment testing for ChIP-seq data

Read this article at

Abstract

Related collections

Genes & Diseases

Most cited references 28

Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

DAVID: Database for Annotation, Visualization, and Integrated Discovery.

Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs

Author and article information

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Custom metadata

Comments

Comment on this article

Similar content 240

Cited by 69

Most referenced authors 2,051