+1 Recommend
1 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Species-Level Deconvolution of Metagenome Assemblies with Hi-C–Based Contact Probability Maps

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          Microbial communities consist of mixed populations of organisms, including unknown species in unknown abundances. These communities are often studied through metagenomic shotgun sequencing, but standard library construction methods remove long-range contiguity information; thus, shotgun sequencing and de novo assembly of a metagenome typically yield a collection of contigs that cannot readily be grouped by species. Methods for generating chromatin-level contact probability maps, e.g., as generated by the Hi-C method, provide a signal of contiguity that is completely intracellular and contains both intrachromosomal and interchromosomal information. Here, we demonstrate how this signal can be exploited to reconstruct the individual genomes of microbial species present within a mixed sample. We apply this approach to two synthetic metagenome samples, successfully clustering the genome content of fungal, bacterial, and archaeal species with more than 99% agreement with published reference genomes. We also show that the Hi-C signal can secondarily be used to create scaffolded genome assemblies of individual eukaryotic species present within the microbial community, with higher levels of contiguity than some of the species’ published reference genomes.

          Related collections

          Most cited references 18

          • Record: found
          • Abstract: found
          • Article: not found

          Cluster analysis and display of genome-wide expression patterns.

          A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and the underlying expression data simultaneously in a form intuitive for biologists. We have found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function, and we find a similar tendency in human data. Thus patterns seen in genome-wide expression experiments can be interpreted as indications of the status of cellular processes. Also, coexpression of genes of known function with poorly characterized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not available currently.
            • Record: found
            • Abstract: found
            • Article: not found

            Microbial community gene expression in ocean surface waters.

            Metagenomics is expanding our knowledge of the gene content, functional significance, and genetic variability in natural microbial communities. Still, there exists limited information concerning the regulation and dynamics of genes in the environment. We report here global analysis of expressed genes in a naturally occurring microbial community. We first adapted RNA amplification technologies to produce large amounts of cDNA from small quantities of total microbial community RNA. The fidelity of the RNA amplification procedure was validated with Prochlorococcus cultures and then applied to a microbial assemblage collected in the oligotrophic Pacific Ocean. Microbial community cDNAs were analyzed by pyrosequencing and compared with microbial community genomic DNA sequences determined from the same sample. Pyrosequencing-based estimates of microbial community gene expression compared favorably to independent assessments of individual gene expression using quantitative PCR. Genes associated with key metabolic pathways in open ocean microbial species-including genes involved in photosynthesis, carbon fixation, and nitrogen acquisition-and a number of genes encoding hypothetical proteins were highly represented in the cDNA pool. Genes present in the variable regions of Prochlorococcus genomes were among the most highly expressed, suggesting these encode proteins central to cellular processes in specific genotypes. Although many transcripts detected were highly similar to genes previously detected in ocean metagenomic surveys, a significant fraction ( approximately 50%) were unique. Thus, microbial community transcriptomic analyses revealed not only indigenous gene- and taxon-specific expression patterns but also gene categories undetected in previous DNA-based metagenomic surveys.
              • Record: found
              • Abstract: found
              • Article: not found

              Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture.

              Hi-C experiments measure the probability of physical proximity between pairs of chromosomal loci on a genomic scale. We report on several systematic biases that substantially affect the Hi-C experimental procedure, including the distance between restriction sites, the GC content of trimmed ligation junctions and sequence uniqueness. To address these biases, we introduce an integrated probabilistic background model and develop algorithms to estimate its parameters and renormalize Hi-C data. Analysis of corrected human lymphoblast contact maps provides genome-wide evidence for interchromosomal aggregation of active chromatin marks, including DNase-hypersensitive sites and transcriptionally active foci. We observe extensive long-range (up to 400 kb) cis interactions at active promoters and derive asymmetric contact profiles next to transcription start sites and CTCF binding sites. Clusters of interacting chromosomal domains suggest physical separation of centromere-proximal and centromere-distal regions. These results provide a computational basis for the inference of chromosomal architectures from Hi-C experiments.

                Author and article information

                G3 (Bethesda)
                G3: Genes, Genomes, Genetics
                G3: Genes, Genomes, Genetics
                G3: Genes, Genomes, Genetics
                G3: Genes|Genomes|Genetics
                Genetics Society of America
                22 May 2014
                July 2014
                : 4
                : 7
                : 1339-1346
                Department of Genome Sciences, University of Washington, Seattle, Washington 98195-5065
                Author notes

                Supporting information is available online at http://www.g3journal.org/lookup/suppl/doi:10.1534/g3.114.011825/-/DC1

                Sequencing datasets generated by this study have been deposited in the NCBI Short Read Archive under the accession SRP041431.

                [2 ]Corresponding authors: University of Washington, Foege Building S-403B, Box 355065, 3720 15th Ave NE, Seattle, WA 98195-5065. E-mail: maitreya@ 123456uw.edu ; and University of Washington, Foege Building S-250, Box 355065, 3720 15th Ave NE, Seattle, WA 98195-5065. E-mail: shendure@ 123456uw.edu
                Copyright © 2014 Burton et al

                This is an open-access article distributed under the terms of the Creative Commons Attribution Unported License ( http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                Page count
                Pages: 8
                Custom metadata


                clustering algorithms, metagenomics, metagenome assembly, hi-c


                Comment on this article