Species-Level Deconvolution of Metagenome Assemblies with Hi-C–Based Contact Probability Maps

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Microbial communities consist of mixed populations of organisms, including unknown species in unknown abundances. These communities are often studied through metagenomic shotgun sequencing, but standard library construction methods remove long-range contiguity information; thus, shotgun sequencing and de novo assembly of a metagenome typically yield a collection of contigs that cannot readily be grouped by species. Methods for generating chromatin-level contact probability maps, e.g., as generated by the Hi-C method, provide a signal of contiguity that is completely intracellular and contains both intrachromosomal and interchromosomal information. Here, we demonstrate how this signal can be exploited to reconstruct the individual genomes of microbial species present within a mixed sample. We apply this approach to two synthetic metagenome samples, successfully clustering the genome content of fungal, bacterial, and archaeal species with more than 99% agreement with published reference genomes. We also show that the Hi-C signal can secondarily be used to create scaffolded genome assemblies of individual eukaryotic species present within the microbial community, with higher levels of contiguity than some of the species’ published reference genomes.

Most cited references 17

Record: found
Abstract: found
Article: not found

Cluster analysis and display of genome-wide expression patterns.

P. T. Spellman, P. O. Brown, D Botstein … (1998)

A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and the underlying expression data simultaneously in a form intuitive for biologists. We have found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function, and we find a similar tendency in human data. Thus patterns seen in genome-wide expression experiments can be interpreted as indications of the status of cellular processes. Also, coexpression of genes of known function with poorly characterized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not available currently.

0 comments Cited 1865 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture.

Eitan Yaffe, Amos Tanay (2011)

Hi-C experiments measure the probability of physical proximity between pairs of chromosomal loci on a genomic scale. We report on several systematic biases that substantially affect the Hi-C experimental procedure, including the distance between restriction sites, the GC content of trimmed ligation junctions and sequence uniqueness. To address these biases, we introduce an integrated probabilistic background model and develop algorithms to estimate its parameters and renormalize Hi-C data. Analysis of corrected human lymphoblast contact maps provides genome-wide evidence for interchromosomal aggregation of active chromatin marks, including DNase-hypersensitive sites and transcriptionally active foci. We observe extensive long-range (up to 400 kb) cis interactions at active promoters and derive asymmetric contact profiles next to transcription start sites and CTCF binding sites. Clusters of interacting chromosomal domains suggest physical separation of centromere-proximal and centromere-distal regions. These results provide a computational basis for the inference of chromosomal architectures from Hi-C experiments.

0 comments Cited 271 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Microbial community gene expression in ocean surface waters.

Jorge Frias-Lopez, Yanmei Shi, Gene W. Tyson … (2008)

Metagenomics is expanding our knowledge of the gene content, functional significance, and genetic variability in natural microbial communities. Still, there exists limited information concerning the regulation and dynamics of genes in the environment. We report here global analysis of expressed genes in a naturally occurring microbial community. We first adapted RNA amplification technologies to produce large amounts of cDNA from small quantities of total microbial community RNA. The fidelity of the RNA amplification procedure was validated with Prochlorococcus cultures and then applied to a microbial assemblage collected in the oligotrophic Pacific Ocean. Microbial community cDNAs were analyzed by pyrosequencing and compared with microbial community genomic DNA sequences determined from the same sample. Pyrosequencing-based estimates of microbial community gene expression compared favorably to independent assessments of individual gene expression using quantitative PCR. Genes associated with key metabolic pathways in open ocean microbial species-including genes involved in photosynthesis, carbon fixation, and nitrogen acquisition-and a number of genes encoding hypothetical proteins were highly represented in the cDNA pool. Genes present in the variable regions of Prochlorococcus genomes were among the most highly expressed, suggesting these encode proteins central to cellular processes in specific genotypes. Although many transcripts detected were highly similar to genes previously detected in ocean metagenomic surveys, a significant fraction ( approximately 50%) were unique. Thus, microbial community transcriptomic analyses revealed not only indigenous gene- and taxon-specific expression patterns but also gene categories undetected in previous DNA-based metagenomic surveys.

0 comments Cited 259 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): G3 (Bethesda)

Journal ID (iso-abbrev): Genetics

Journal ID (hwp): G3: Genes, Genomes, Genetics

Journal ID (pmc): G3: Genes, Genomes, Genetics

Journal ID (publisher-id): G3: Genes, Genomes, Genetics

Title: G3: Genes|Genomes|Genetics

Publisher: Genetics Society of America

ISSN (Electronic): 2160-1836

Publication date (Electronic): 22 May 2014

Publication date Collection: July 2014

Volume: 4

Issue: 7

Pages: 1339-1346

Affiliations

[1]Department of Genome Sciences, University of Washington, Seattle, Washington 98195-5065

Author notes

Supporting information is available online at http://www.g3journal.org/lookup/suppl/doi:10.1534/g3.114.011825/-/DC1

Sequencing datasets generated by this study have been deposited in the NCBI Short Read Archive under the accession SRP041431.

[2 ]Corresponding authors: University of Washington, Foege Building S-403B, Box 355065, 3720 15th Ave NE, Seattle, WA 98195-5065. E-mail: maitreya@ 123456uw.edu ; and University of Washington, Foege Building S-250, Box 355065, 3720 15th Ave NE, Seattle, WA 98195-5065. E-mail: shendure@ 123456uw.edu

Article

Publisher ID: GGG_011825

DOI: 10.1534/g3.114.011825

PMC ID: 4455782

PubMed ID: 24855317

SO-VID: 42ef56a8-d479-4ee1-b1a3-008a91449d19

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution Unported License ( http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 25 April 2014

Date accepted : 22 May 2014

Page count

Pages: 8

Custom metadata

DJS Export v1

ScienceOpen disciplines: Genetics

Keywords: hi-c,metagenome assembly,metagenomics,clustering algorithms

Data availability:

ScienceOpen disciplines: Genetics

Keywords: hi-c, metagenome assembly, metagenomics, clustering algorithms

Comments

Comment on this article

scite_

Cited by 84

See all cited by

Most referenced authors 2,505

See all reference authors

Species-Level Deconvolution of Metagenome Assemblies with Hi-C–Based Contact Probability Maps

Read this article at

Abstract

Most cited references 17

Cluster analysis and display of genome-wide expression patterns.

Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture.

Microbial community gene expression in ocean surface waters.

Author and article information

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Custom metadata

Comments

Comment on this article

Similar content 66

Cited by 84

Most referenced authors 2,505