70
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Due to the complexity of the protocols and a limited knowledge of the nature of microbial communities, simulating metagenomic sequences plays an important role in testing the performance of existing tools and data analysis methods with metagenomic data. We developed metagenomic read simulators with platform-specific (Sanger, pyrosequencing, Illumina) base-error models, and simulated metagenomes of differing community complexities. We first evaluated the effect of rigorous quality control on Illumina data. Although quality filtering removed a large proportion of the data, it greatly improved the accuracy and contig lengths of resulting assemblies. We then compared the quality-trimmed Illumina assemblies to those from Sanger and pyrosequencing. For the simple community (10 genomes) all sequencing technologies assembled a similar amount and accurately represented the expected functional composition. For the more complex community (100 genomes) Illumina produced the best assemblies and more correctly resembled the expected functional composition. For the most complex community (400 genomes) there was very little assembly of reads from any sequencing technology. However, due to the longer read length the Sanger reads still represented the overall functional composition reasonably well. We further examined the effect of scaffolding of contigs using paired-end Illumina reads. It dramatically increased contig lengths of the simple community and yielded minor improvements to the more complex communities. Although the increase in contig length was accompanied by increased chimericity, it resulted in more complete genes and a better characterization of the functional repertoire. The metagenomic simulators developed for this research are freely available.

          Related collections

          Most cited references32

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences

          Increased reliance on computational approaches in the life sciences has revealed grave concerns about how accessible and reproducible computation-reliant results truly are. Galaxy http://usegalaxy.org, an open web-based platform for genomic research, addresses these problems. Galaxy automatically tracks and manages data provenance and provides support for capturing the context and intent of computational methods. Galaxy Pages are interactive, web-based documents that provide users with a medium to communicate a complete computational analysis.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Community structure and metabolism through reconstruction of microbial genomes from the environment.

            Microbial communities are vital in the functioning of all ecosystems; however, most microorganisms are uncultivated, and their roles in natural systems are unclear. Here, using random shotgun sequencing of DNA from a natural acidophilic biofilm, we report reconstruction of near-complete genomes of Leptospirillum group II and Ferroplasma type II, and partial recovery of three other genomes. This was possible because the biofilm was dominated by a small number of species populations and the frequency of genomic rearrangements and gene insertions or deletions was relatively low. Because each sequence read came from a different individual, we could determine that single-nucleotide polymorphisms are the predominant form of heterogeneity at the strain level. The Leptospirillum group II genome had remarkably few nucleotide polymorphisms, despite the existence of low-abundance variants. The Ferroplasma type II genome seems to be a composite from three ancestral strains that have undergone homologous recombination to form a large population of mosaic genomes. Analysis of the gene complement for each organism revealed the pathways for carbon and nitrogen fixation and energy generation, and provided insights into survival strategies in an extreme environment.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Comparative metagenomics of microbial communities.

              The species complexity of microbial communities and challenges in culturing representative isolates make it difficult to obtain assembled genomes. Here we characterize and compare the metabolic capabilities of terrestrial and marine microbial communities using largely unassembled sequence data obtained by shotgun sequencing DNA isolated from the various environments. Quantitative gene content analysis reveals habitat-specific fingerprints that reflect known characteristics of the sampled environments. The identification of environment-specific genes through a gene-centric comparative analysis presents new opportunities for interpreting and diagnosing environments.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, USA )
                1932-6203
                2012
                23 February 2012
                : 7
                : 2
                : e31386
                Affiliations
                [1 ]European Molecular Biology Laboratory, Heidelberg, Germany
                [2 ]Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge Massachusetts, United States of America
                [3 ]VIB, Vrije Universiteit Brussel, Brussels, Belgium
                [4 ]Max Delbrück Centre for Molecular Medicine, Berlin, Germany
                Hospital for Sick Children, Canada
                Author notes

                Conceived and designed the experiments: ASW DRM SS JR PB. Performed the experiments: ASW DRM SS AIJ MMC MA JR. Analyzed the data: ASW DRM SS. Contributed reagents/materials/analysis tools: ASW DRM SS AIJ MMC MA. Wrote the paper: ASW DRM PB.

                Article
                PONE-D-11-20074
                10.1371/journal.pone.0031386
                3285633
                22384016
                007634fa-f922-4f2b-a40c-a658ac3bcb52
                Mende et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                History
                : 11 October 2011
                : 6 January 2012
                Page count
                Pages: 11
                Categories
                Research Article
                Biology
                Computational Biology
                Genomics
                Genome Analysis Tools
                Ecology
                Genomics
                Microbiology
                Computer Science
                Computerized Simulations

                Uncategorized
                Uncategorized

                Comments

                Comment on this article