Integrative analysis of 111 reference human epigenomes.

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.

Related collections

Most cited references 130

Record: found
Abstract: found
Article: not found

Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Paul Shannon, Andrew Markiel, Owen Ozier … (2003)

Cytoscape is an open source software project for integrating biomolecular interaction networks with high-throughput expression data and other molecular states into a unified conceptual framework. Although applicable to any system of molecular components and interactions, Cytoscape is most powerful when used in conjunction with large databases of protein-protein, protein-DNA, and genetic interactions that are increasingly available for humans and model organisms. Cytoscape's software Core provides basic functionality to layout and query the network; to visually integrate the network with expression profiles, phenotypes, and other molecular states; and to link the network to databases of functional annotations. The Core is extensible through a straightforward plug-in architecture, allowing rapid development of additional computational analyses and features. Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.

0 comments Cited 10953 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Fast and accurate short read alignment with Burrows–Wheeler transform

Heng Li, Richard Durbin (2009)

Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ∼10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: rd@sanger.ac.uk

0 comments Cited 10189 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Model-based Analysis of ChIP-Seq (MACS)

Yong Zhang, Tao Liu, Clifford Meyer … (2008)

Background The determination of the 'cistrome', the genome-wide set of in vivo cis-elements bound by trans-factors [1], is necessary to determine the genes that are directly regulated by those trans-factors. Chromatin immunoprecipitation (ChIP) [2] coupled with genome tiling microarrays (ChIP-chip) [3,4] and sequencing (ChIP-Seq) [5-8] have become popular techniques to identify cistromes. Although early ChIP-Seq efforts were limited by sequencing throughput and cost [2,9], tremendous progress has been achieved in the past year in the development of next generation massively parallel sequencing. Tens of millions of short tags (25-50 bases) can now be simultaneously sequenced at less than 1% the cost of traditional Sanger sequencing methods. Technologies such as Illumina's Solexa or Applied Biosystems' SOLiD™ have made ChIP-Seq a practical and potentially superior alternative to ChIP-chip [5,8]. While providing several advantages over ChIP-chip, such as less starting material, lower cost, and higher peak resolution, ChIP-Seq also poses challenges (or opportunities) in the analysis of data. First, ChIP-Seq tags represent only the ends of the ChIP fragments, instead of precise protein-DNA binding sites. Although tag strand information and the approximate distance to the precise binding site could help improve peak resolution, a good tag to site distance estimate is often unknown to the user. Second, ChIP-Seq data exhibit regional biases along the genome due to sequencing and mapping biases, chromatin structure and genome copy number variations [10]. These biases could be modeled if matching control samples are sequenced deeply enough. However, among the four recently published ChIP-Seq studies [5-8], one did not have a control sample [5] and only one of the three with control samples systematically used them to guide peak finding [8]. That method requires peaks to contain significantly enriched tags in the ChIP sample relative to the control, although a small ChIP peak region often contains too few control tags to robustly estimate the background biases. Here, we present Model-based Analysis of ChIP-Seq data, MACS, which addresses these issues and gives robust and high resolution ChIP-Seq peak predictions. We conducted ChIP-Seq of FoxA1 (hepatocyte nuclear factor 3α) in MCF7 cells for comparison with FoxA1 ChIP-chip [1] and identification of features unique to each platform. When applied to three human ChIP-Seq datasets to identify binding sites of FoxA1 in MCF7 cells, NRSF (neuron-restrictive silencer factor) in Jurkat T cells [8], and CTCF (CCCTC-binding factor) in CD4+ T cells [5] (summarized in Table S1 in Additional data file 1), MACS gives results superior to those produced by other published ChIP-Seq peak finding algorithms [8,11,12]. Results Modeling the shift size of ChIP-Seq tags ChIP-Seq tags represent the ends of fragments in a ChIP-DNA library and are often shifted towards the 3' direction to better represent the precise protein-DNA interaction site. The size of the shift is, however, often unknown to the experimenter. Since ChIP-DNA fragments are equally likely to be sequenced from both ends, the tag density around a true binding site should show a bimodal enrichment pattern, with Watson strand tags enriched upstream of binding and Crick strand tags enriched downstream. MACS takes advantage of this bimodal pattern to empirically model the shifting size to better locate the precise binding sites. Given a sonication size (bandwidth) and a high-confidence fold-enrichment (mfold), MACS slides 2bandwidth windows across the genome to find regions with tags more than mfold enriched relative to a random tag genome distribution. MACS randomly samples 1,000 of these high-quality peaks, separates their Watson and Crick tags, and aligns them by the midpoint between their Watson and Crick tag centers (Figure 1a) if the Watson tag center is to the left of the Crick tag center. The distance between the modes of the Watson and Crick peaks in the alignment is defined as 'd', and MACS shifts all the tags by d/2 toward the 3' ends to the most likely protein-DNA interaction sites. Figure 1 MACS model for FoxA1 ChIP-Seq. (a,b) The 5' ends of strand-separated tags from a random sample of 1,000 model peaks, aligned by the center of their Watson and Crick peaks (a) and by the FKHR motif (b). (c) The tag count in ChIP versus control in 10 kb windows across the genome. Each dot represents a 10 kb window; red dots are windows containing ChIP peaks and black dots are windows containing control peaks used for FDR calculation. (d) Tag density profile in control samples around FoxA1 ChIP-Seq peaks. (e,f) MACS improves the motif occurrence in the identified peak centers (e) and the spatial resolution (f) for FoxA1 ChIP-Seq through tag shifting and λlocal. Peaks are ranked by p-value. The motif occurrence is calculated as the percentage of peaks with the FKHR motif within 50 bp of the peak summit. The spatial resolution is calculated as the average distance from the summit to the nearest FKHR motif. Peaks with no FKHR motif within 150 bp of the peak summit are removed from the spatial resolution calculation. When applied to FoxA1 ChIP-Seq, which was sequenced with 3.9 million uniquely mapped tags, MACS estimates the d to be only 126 bp (Figure 1a; suggesting a tag shift size of 63 bp), despite a sonication size (bandwidth) of around 500 bp and Solexa size-selection of around 200 bp. Since the FKHR motif sequence dictates the precise FoxA1 binding location, the true distribution of d could be estimated by aligning the tags by the FKHR motif (122 bp; Figure 1b), which gives a similar result to the MACS model. When applied to NRSF and CTCF ChIP-Seq, MACS also estimates a reasonable d solely from the tag distribution: for NRSF ChIP-Seq the MACS model estimated d as 96 bp compared to the motif estimate of 70 bp; applied to CTCF ChIP-Seq data the MACS model estimated a d of 76 bp compared to the motif estimate of 62 bp. Peak detection For experiments with a control, MACS linearly scales the total control tag count to be the same as the total ChIP tag count. Sometimes the same tag can be sequenced repeatedly, more times than expected from a random genome-wide tag distribution. Such tags might arise from biases during ChIP-DNA amplification and sequencing library preparation, and are likely to add noise to the final peak calls. Therefore, MACS removes duplicate tags in excess of what is warranted by the sequencing depth (binomial distribution p-value 2). Among the remaining 34.6% ChIP-Seq unique peaks, 1,045 (13.3%) were not tiled or only partially tiled on the arrays due to the array design. Therefore, only 21.4% of ChIP-Seq peaks are indeed specific to the sequencing platform. Furthermore, ChIP-chip targets with higher fold-enrichments are more likely to be reproducibly detected by ChIP-Seq with a higher tag count (Figure 3b). Meanwhile, although the signals of array probes at the ChIP-Seq specific peak regions are below the peak-calling cutoff, they show moderate signal enrichments that are significantly higher than the genomic background (Wilcoxon p-value 2) and ChIP-Seq (MACS; FDR <1%). Shown are the numbers of regions detected by both platforms (that is, having at least 1 bp in common) or unique to each platform. (b) The distributions of ChIP-Seq tag number and ChIP-chip MATscore [13] for FoxA1 binding sites identified by both platforms. (c) MATscore distributions of FoxA1 ChIP-chip at ChIP-Seq/chip overlapping peaks, ChIP-Seq unique peaks, and genome background. For each peak, the mean MATscore for all probes within the 300 bp region centered at the ChIP-Seq peak summit is used. Genome background is based on MATscores of all array probes in the FoxA1 ChIP-chip data. (d) Width distributions of FoxA1 ChIP-Seq/chip overlapping peaks and ChIP-Seq unique peaks at different fold-enrichments (less than 25, 25 to 50, and larger than 50). (e) Spatial resolution for FoxA1 ChIP-chip and ChIP-Seq peaks. The Wilcoxon test was used to calculate the p-values for (d) and (e). (f) Motif occurrence within the central 200 bp regions for FoxA1 ChIP-Seq/chip overlapping peaks and platform unique peaks. Error bars showing standard deviation were calculated from random sampling of 500 peaks ten times for each category. Background motif occurrences are based on 100,000 randomly selected 200 bp regions in the human genome, excluding regions in genome assembly gaps (containing 'N'). Comparing the difference between ChIP-chip and ChIP-Seq peaks, we find that the average peak width from ChIP-chip is twice as large as that from ChIP-Seq. The average distance from peak summit to motif is significantly smaller in ChIP-Seq than ChIP-chip (Figure 3e), demonstrating the superior resolution of ChIP-Seq. Under the same 1% FDR cutoff, the FKHR motif occurrence within the central 200 bp from ChIP-chip or ChIP-Seq specific peaks is comparable with that from the overlapping peaks (Figure 3f). This suggests that most of the platform-specific peaks are genuine binding sites. A comparison between NRSF ChIP-Seq and ChIP-chip (Figure S3 in Additional data file 1) yields similar results, although the overlapping peaks for NRSF are of much better quality than the platform-specific peaks. Discussion ChIP-Seq users are often curious as to whether they have sequenced deep enough to saturate all the binding sites. In principle, sequencing saturation should be dependent on the fold-enrichment, since higher-fold peaks are saturated earlier than lower-fold ones. In addition, due to different cost and throughput considerations, different users might be interested in recovering sites at different fold-enrichment cutoffs. Therefore, MACS produces a saturation table to report, at different fold-enrichments, the proportion of sites that could still be detected when using 90% to 20% of the tags. Such tables produced for FoxA1 (3.9 million tags) and NRSF (2.2 million tags) ChIP-Seq data sets (Figure S4 in Additional data file 1; CTCF does not have a control to robustly estimate fold-enrichment) show that while peaks with over 60-fold enrichment have been saturated, deeper sequencing could still recover more sites less than 40-fold enriched relative to the chromatin input DNA. As sequencing technologies improve their throughput, researchers are gradually increasing their sequencing depth, so this question could be revisited in the future. For now, we leave it up to individual users to make an informed decision on whether to sequence more based on the saturation at different fold-enrichment levels. The d modeled by MACS suggests that some short read sequencers such as Solexa may preferentially sequence shorter fragments in a ChIP-DNA pool. This may contribute to the superior resolution observed in ChIP-Seq data, especially for activating transcription and epigenetic factors in open chromatin. However, for repressive factors targeting relatively compact chromatin, the target regions might be harder to sonicate into the soluble extract. Furthermore, in the resulting ChIP-DNA, the true targets may tend to be longer than the background DNA in open chromatin, making them unfavorable for size-selection and sequencing. This implies that epigenetic markers of closed chromatin may be harder to ChIP, and even harder to ChIP-Seq. To assess this potential bias, examining the histone mark ChIP-Seq results from Mikkelsen et al. [7], we find that while the ChIP-Seq efficiency of the active mark H3K4me3 remains high as pluripotent cells differentiate, that of repressive marks H3K27me3 and H3K9me3 becomes lower with differentiation (Table S2 in Additional data file 1), even though it is likely that there are more targets for these repressive marks as cells differentiate. We caution ChIP-Seq users to adopt measures to compensate for this bias when ChIPing repressive marks, such as more vigorous sonication, size-selecting slightly bigger fragments for library preparation, or sonicating the ChIP-DNA further between decrosslinking and library preparation. MACS calculates the FDR based on the number of peaks from control over ChIP that are called at the same p-value cutoff. This FDR estimate is more robust than calculating the FDR from randomizing tags along the genome. However, we notice that when tag counts from ChIP and controls are not balanced, the sample with more tags often gives more peaks even though MACS normalizes the total tag counts between the two samples (Figure S5 in Additional data file 1). While we await more available ChIP-Seq data with deeper coverage to understand and overcome this bias, we suggest to ChIP-Seq users that if they sequence more ChIP tags than controls, the FDR estimate of their ChIP peaks might be overly optimistic. Conclusion As developments in sequencing technology popularize ChIP-Seq, we propose a novel algorithm, MACS, for its data analysis. MACS offers four important utilities for predicting protein-DNA interaction sites from ChIP-Seq. First, MACS improves the spatial resolution of the predicted sites by empirically modeling the distance d and shifting tags by d/2. Second, MACS uses a dynamic λlocal parameter to capture local biases in the genome and improves the robustness and specificity of the prediction. It is worth noting that in addition to ChIP-Seq, λlocal can potentially be applied to other high throughput sequencing applications, such as copy number variation and digital gene expression, to capture regional biases and estimate robust fold-enrichment. Third, MACS can be applied to ChIP-Seq experiments without controls, and to those with controls with improved performance. Last but not least, MACS is easy to use and provides detailed information for each peak, such as genome coordinates, p-value, FDR, fold_enrichment, and summit (peak center). Materials and methods Dataset ChIP-Seq data for three factors, NRSF, CTCF, and FoxA1, were used in this study. ChIP-chip and ChIP-Seq (2.2 million ChIP and 2.8 million control uniquely mapped reads, simplified as 'tags') data for NRSF in Jurkat T cells were obtained from Gene Expression Omnibus (GSM210637) and Johnson et al. [8], respectively. ChIP-Seq (2.9 million ChIP tags) data for CTCF in CD4+ T cells were derived from Barski et al. [5]. ChIP-chip data for FoxA1 and controls in MCF7 cells were previously published [1], and their corresponding ChIP-Seq data were generated specifically for this study. Around 3 ng FoxA1 ChIP DNA and 3 ng control DNA were used for library preparation, each consisting of an equimolar mixture of DNA from three independent experiments. Libraries were prepared as described in [8] using a PCR preamplification step and size selection for DNA fragments between 150 and 400 bp. FoxA1 ChIP and control DNA were each sequenced with two lanes by the Illumina/Solexa 1G Genome Analyzer, and yielded 3.9 million and 5.2 million uniquely mapped tags, respectively. Software implementation MACS is implemented in Python and freely available with an open source Artistic License at [16]. It runs from the command line and takes the following parameters: -t for treatment file (ChIP tags, this is the ONLY required parameter for MACS) and -c for control file containing mapped tags; --format for input file format in BED or ELAND (output) format (default BED); --name for name of the run (for example, FoxA1, default NA); --gsize for mappable genome size to calculate λBG from tag count (default 2.7G bp, approximately the mappable human genome size); --tsize for tag size (default 25); --bw for bandwidth, which is half of the estimated sonication size (default 300); --pvalue for p-value cutoff to call peaks (default 1e-5); --mfold for high-confidence fold-enrichment to find model peaks for MACS modeling (default 32); --diag for generating the table to evaluate sequence saturation (default off). In addition, the user has the option to shift tags by an arbitrary number (--shiftsize) without the MACS model (--nomodel), to use a global lambda (--nolambda) to call peaks, and to show debugging and warning messages (--verbose). If a user has replicate files for ChIP or control, it is recommended to concatenate all replicates into one input file. The output includes one BED file containing the peak chromosome coordinates, and one xls file containing the genome coordinates, summit, p-value, fold_enrichment and FDR (if control is available) of each peak. For FoxA1 ChIP-Seq in MCF7 cells with 3.9 million and 5.2 million ChIP and control tags, respectively, it takes MACS 15 seconds to model the ChIP-DNA size distribution and less than 3 minutes to detect peaks on a 2 GHz CPU Linux computer with 2 GB of RAM. Figure S6 in Additional data file 1 illustrates the whole process with a flow chart. Abbreviations ChIP, chromatin immunoprecipitation; CTCF, CCCTC-binding factor; FDR, false discovery rate; FoxA1, hepatocyte nuclear factor 3α; MACS, Model-based Analysis of ChIP-Seq data; NRSF, neuron-restrictive silencer factor. Authors' contributions XSL, WL and YZ conceived the project and wrote the paper. YZ, TL and CAM designed the algorithm, performed the research and implemented the software. JE, DSJ, BEB, CN, RMM and MB performed FoxA1 ChIP-Seq experiments and contributed to ideas. All authors read and approved the final manuscript. Additional data files The following additional data are available. Additional data file 1 contains supporting Figures S1-S6, and supporting Tables S1 and S2. Supplementary Material Additional data file 1 Figures S1-S6, and Tables S1 and S2. Click here for file

0 comments Cited 4138 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (iso-abbrev): Nature

Title: Nature

Publisher: Springer Nature

ISSN (Electronic): 1476-4687

ISSN (Print): 0028-0836

Publication date (Electronic): Feb 19 2015

Volume: 518

Issue: 7539

Affiliations

[1 ] 1] Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, Massachusetts 02139, USA. [2] The Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, Massachusetts 02142, USA. [3] Department of Genetics, Department of Computer Science, 300 Pasteur Dr., Lane Building, L301, Stanford, California 94305-5120, USA.

[2 ] 1] Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, Massachusetts 02139, USA. [2] The Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, Massachusetts 02142, USA.

[3 ] 1] Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, Massachusetts 02139, USA. [2] The Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, Massachusetts 02142, USA. [3] Department of Biological Chemistry, University of California, Los Angeles, 615 Charles E Young Dr South, Los Angeles, California 90095, USA.

[4 ] Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 675 West 10th Avenue, Vancouver, British Columbia V5Z 1L3, Canada.

[5 ] 1] The Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, Massachusetts 02142, USA. [2] Department of Stem Cell and Regenerative Biology, 7 Divinity Ave, Cambridge, Massachusetts 02138, USA.

[6 ] Epigenome Center, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA.

[7 ] Department of Cellular and Molecular Medicine, Institute of Genomic Medicine, Moores Cancer Center, Department of Chemistry and Biochemistry, University of California San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA.

[8 ] Genomic Analysis Laboratory, Howard Hughes Medical Institute &The Salk Institute for Biological Studies, 10010 N. Torrey Pines Road, La Jolla, California 92037, USA.

[9 ] Department of Genome Sciences, University of Washington, 3720 15th Ave. NE, Seattle, Washington 98195, USA.

[10 ] 1] Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, Massachusetts 02139, USA. [2] The Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, Massachusetts 02142, USA. [3] Biology Department, Massachusetts Institute of Technology, 31 Ames St, Cambridge, Massachusetts 02142, USA.

[11 ] The Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, Massachusetts 02142, USA.

[12 ] 1] The Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, Massachusetts 02142, USA. [2] The Picower Institute for Learning and Memory, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St, Cambridge, Massachusetts 02139, USA.

[13 ] 1] Department of Cellular and Molecular Medicine, Institute of Genomic Medicine, Moores Cancer Center, Department of Chemistry and Biochemistry, University of California San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA. [2] Ludwig Institute for Cancer Research, 9500 Gilman Drive, La Jolla, California 92093, USA.

[14 ] Department of Neurosurgery, Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, 1450 3rd Street, San Francisco, California 94158, USA.

[15 ] Department of Pathology, University of California San Francisco, 513 Parnassus Avenue, San Francisco, California 94143-0511, USA.

[16 ] Department of Medicine, Division of Medical Genetics, University of Washington, 2211 Elliot Avenue, Seattle, Washington 98121, USA.

[17 ] 1] Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, Massachusetts 02139, USA. [2] The Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, Massachusetts 02142, USA. [3] Department of Computer Science &Engineering, University of Connecticut, 371 Fairfield Way, Storrs, Connecticut 06269, USA.

[18 ] Department of Microbiology and Immunology and Centre for High-Throughput Biology, University of British Columbia, 2125 East Mall, Vancouver, British Columbia V6T 1Z4, Canada.

[19 ] Bioinformatics Group, Department of Molecular Biology, Division of Biology, Faculty of Science, University of Zagreb, Horvatovac 102a, 10000 Zagreb, Croatia.

[20 ] Department of Molecular and Cell Biology, Center for Systems Biology, The University of Texas, Dallas, NSERL, RL10, 800 W Campbell Road, Richardson, Texas 75080, USA.

[21 ] Department of Genetics, Center for Genome Sciences and Systems Biology, Washington University in St Louis, 4444 Forest Park Ave, St Louis, Missouri 63108, USA.

[22 ] Institute for Molecular Bioscience, University of Queensland, St Lucia, Queensland 4072, Australia.

[23 ] 1] The Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, Massachusetts 02142, USA. [2] Brigham &Women's Hospital, 75 Francis Street, Boston, Massachusetts 02115, USA.

[24 ] 1] Department of Genetics, Center for Genome Sciences and Systems Biology, Washington University in St Louis, 4444 Forest Park Ave, St Louis, Missouri 63108, USA. [2] Department of Computer Science and Engineeering, Washington University in St. Louis, St. Louis, Missouri 63130, USA.

[25 ] 1] Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York 11794-3600, USA. [2] Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA.

[26 ] Molecular and Human Genetics Department, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA.

[27 ] Biology Department, Massachusetts Institute of Technology, 31 Ames St, Cambridge, Massachusetts 02142, USA.

[28 ] 1] The Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, Massachusetts 02142, USA. [2] Brigham &Women's Hospital, 75 Francis Street, Boston, Massachusetts 02115, USA. [3] Harvard Medical School, 25 Shattuck St, Boston, Massachusetts 02115, USA.

[29 ] Department of Biochemistry, Keck School of Medicine, University of Southern California, 1450 Biggy Street, Los Angeles, California 90089-9601, USA.

[30 ] ObGyn, Reproductive Sciences, University of California San Francisco, 35 Medical Center Way, San Francisco, California 94143, USA.

[31 ] Center for Biomolecular Sciences and Engineering, University of Santa Cruz, 1156 High Street, Santa Cruz, California 95064, USA.

[32 ] 1] Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 675 West 10th Avenue, Vancouver, British Columbia V5Z 1L3, Canada. [2] Department of Molecular Biology and Biochemistry, Simon Fraser University, 8888 University Drive, Burnaby, British Columbia V5A 1S6, Canada. [3] Department of Medical Genetics, University of British Columbia, 2329 West Mall, Vancouver, BC, Canada, V6T 1Z4.

[33 ] Dan L. Duncan Cancer Center, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA.

[34 ] 1] Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 675 West 10th Avenue, Vancouver, British Columbia V5Z 1L3, Canada. [2] Department of Medical Genetics, University of British Columbia, 2329 West Mall, Vancouver, BC, Canada, V6T 1Z4.

[35 ] Department of Microbiology and Immunology, Diabetes Center, University of California, San Francisco, 513 Parnassus Ave, San Francisco, California 94143-0534, USA.

[36 ] 1] University of Wisconsin, Madison, Wisconsin 53715, USA. [2] Morgridge Institute for Research, 330 N. Orchard Street, Madison, Wisconsin 53707, USA.

[37 ] USDA/ARS Children's Nutrition Research Center, Baylor College of Medicine, 1100 Bates Street, Houston, Texas 77030, USA.

[38 ] 1] Department of Molecular and Cell Biology, Center for Systems Biology, The University of Texas, Dallas, NSERL, RL10, 800 W Campbell Road, Richardson, Texas 75080, USA. [2] Bioinformatics Division, Center for Synthetic and Systems Biology, TNLIST, Tsinghua University, Beijing 100084, China.

[39 ] National Institute of Environmental Health Sciences, 111 T.W. Alexander Drive, Research Triangle Park, North Carolina 27709, USA.

[40 ] 1] The Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, Massachusetts 02142, USA. [2] Massachusetts General Hospital, 55 Fruit St, Boston, Massachusetts 02114, USA. [3] Howard Hughes Medical Institute, 4000 Jones Bridge Road, Chevy Chase, Maryland 20815-6789, USA.

[41 ] 1] Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 675 West 10th Avenue, Vancouver, British Columbia V5Z 1L3, Canada. [2] Department of Microbiology and Immunology and Centre for High-Throughput Biology, University of British Columbia, 2125 East Mall, Vancouver, British Columbia V6T 1Z4, Canada.

Article

Publisher Item ID: nature14248 Mid ID: NIHMS657700

DOI: 10.1038/nature14248

PMC ID: 4530010

PubMed ID: 25693563

SO-VID: 0d6b40c0-b991-426b-a081-3e3ff780c117

History

Data availability:

Comments

Comment on this article

scite_

Cited by 2,659

See all cited by

- Version 1
- Version 1

Integrative analysis of 111 reference human epigenomes.

Read this article at

Abstract

Related collections

Primate Tool Use

Most cited references 130

Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Fast and accurate short read alignment with Burrows–Wheeler transform

Model-based Analysis of ChIP-Seq (MACS)

Author and article information

Journal

Affiliations

Article

History

Comments

Comment on this article

Similar content 624

Cited by 2,659