Blog
About

  • Record: found
  • Abstract: found
  • Article: found
Is Open Access

Species-Level Deconvolution of Metagenome Assemblies with Hi-C–Based Contact Probability Maps

Read this article at

ScienceOpenPublisherPMC
Bookmark
      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

      Abstract

      Microbial communities consist of mixed populations of organisms, including unknown species in unknown abundances. These communities are often studied through metagenomic shotgun sequencing, but standard library construction methods remove long-range contiguity information; thus, shotgun sequencing and de novo assembly of a metagenome typically yield a collection of contigs that cannot readily be grouped by species. Methods for generating chromatin-level contact probability maps, e.g., as generated by the Hi-C method, provide a signal of contiguity that is completely intracellular and contains both intrachromosomal and interchromosomal information. Here, we demonstrate how this signal can be exploited to reconstruct the individual genomes of microbial species present within a mixed sample. We apply this approach to two synthetic metagenome samples, successfully clustering the genome content of fungal, bacterial, and archaeal species with more than 99% agreement with published reference genomes. We also show that the Hi-C signal can secondarily be used to create scaffolded genome assemblies of individual eukaryotic species present within the microbial community, with higher levels of contiguity than some of the species’ published reference genomes.

      Related collections

      Most cited references 34

      • Record: found
      • Abstract: found
      • Article: not found

      Basic local alignment search tool.

      A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.
        Bookmark
        • Record: found
        • Abstract: found
        • Article: found
        Is Open Access

        Fast and accurate short read alignment with Burrows–Wheeler transform

        Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ∼10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: rd@sanger.ac.uk
          Bookmark
          • Record: found
          • Abstract: found
          • Article: not found

          Cluster analysis and display of genome-wide expression patterns.

          A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and the underlying expression data simultaneously in a form intuitive for biologists. We have found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function, and we find a similar tendency in human data. Thus patterns seen in genome-wide expression experiments can be interpreted as indications of the status of cellular processes. Also, coexpression of genes of known function with poorly characterized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not available currently.
            Bookmark

            Author and article information

            Affiliations
            Department of Genome Sciences, University of Washington, Seattle, Washington 98195-5065
            Author notes

            Supporting information is available online at http://www.g3journal.org/lookup/suppl/doi:10.1534/g3.114.011825/-/DC1

            Sequencing datasets generated by this study have been deposited in the NCBI Short Read Archive under the accession SRP041431.

            [2 ]Corresponding authors: University of Washington, Foege Building S-403B, Box 355065, 3720 15th Ave NE, Seattle, WA 98195-5065. E-mail: maitreya@ 123456uw.edu ; and University of Washington, Foege Building S-250, Box 355065, 3720 15th Ave NE, Seattle, WA 98195-5065. E-mail: shendure@ 123456uw.edu
            Journal
            G3 (Bethesda)
            Genetics
            G3: Genes, Genomes, Genetics
            G3: Genes, Genomes, Genetics
            G3: Genes, Genomes, Genetics
            G3: Genes|Genomes|Genetics
            Genetics Society of America
            2160-1836
            22 May 2014
            July 2014
            : 4
            : 7
            : 1339-1346
            24855317
            4455782
            GGG_011825
            10.1534/g3.114.011825
            Copyright © 2014 Burton et al

            This is an open-access article distributed under the terms of the Creative Commons Attribution Unported License ( http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

            Counts
            Pages: 8
            Product
            Categories
            Investigations
            Custom metadata
            v1

            Genetics

            clustering algorithms, metagenomics, metagenome assembly, hi-c

            Comments

            Comment on this article