26
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans

      data-paper

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Microorganisms play a crucial role in mediating global biogeochemical cycles in the marine environment. By reconstructing the genomes of environmental organisms through metagenomics, researchers are able to study the metabolic potential of Bacteria and Archaea that are resistant to isolation in the laboratory. Utilizing the large metagenomic dataset generated from 234 samples collected during the Tara Oceans circumnavigation expedition, we were able to assemble 102 billion paired-end reads into 562 million contigs, which in turn were co-assembled and consolidated in to 7.2 million contigs ≥2 kb in length. Approximately 1 million of these contigs were binned to reconstruct draft genomes. In total, 2,631 draft genomes with an estimated completion of ≥50% were generated (1,491 draft genomes >70% complete; 603 genomes >90% complete). A majority of the draft genomes were manually assigned phylogeny based on sets of concatenated phylogenetic marker genes and/or 16S rRNA gene sequences. The draft genomes are now publically available for the research community at-large.

          Related collections

          Most cited references23

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          featureCounts: An efficient general-purpose program for assigning sequence reads to genomic features

          , , (2013)
          Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Community structure and metabolism through reconstruction of microbial genomes from the environment.

            Microbial communities are vital in the functioning of all ecosystems; however, most microorganisms are uncultivated, and their roles in natural systems are unclear. Here, using random shotgun sequencing of DNA from a natural acidophilic biofilm, we report reconstruction of near-complete genomes of Leptospirillum group II and Ferroplasma type II, and partial recovery of three other genomes. This was possible because the biofilm was dominated by a small number of species populations and the frequency of genomic rearrangements and gene insertions or deletions was relatively low. Because each sequence read came from a different individual, we could determine that single-nucleotide polymorphisms are the predominant form of heterogeneity at the strain level. The Leptospirillum group II genome had remarkably few nucleotide polymorphisms, despite the existence of low-abundance variants. The Ferroplasma type II genome seems to be a composite from three ancestral strains that have undergone homologous recombination to form a large population of mosaic genomes. Analysis of the gene complement for each organism revealed the pathways for carbon and nitrogen fixation and energy generation, and provided insights into survival strategies in an extreme environment.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Measurement of in situ activities of nonphotosynthetic microorganisms in aquatic and terrestrial habitats.

                Bookmark

                Author and article information

                Journal
                Sci Data
                Sci Data
                Scientific Data
                Nature Publishing Group
                2052-4463
                16 January 2018
                2018
                : 5
                : 170203
                Affiliations
                [1 ]Center for Dark Energy Biosphere Investigations, University of Southern California , Los Angeles, CA 90089, USA
                [2 ]Department of Biological Sciences, University of Southern California , Los Angeles, CA 90089, USA
                Author notes
                [a ] B.J.T. (email: tully.bj@ 123456gmail.com ).
                []

                B.J.T. conceived of and designed the methodology, performed the analysis, wrote the paper, and prepared the figure and tables. E.D.G. performed the analysis and reviewed drafts of the paper. J.H.F. provided funding and resources to perform the analysis and reviewed drafts of the paper.

                Author information
                http://orcid.org/0000-0002-9384-7635
                http://orcid.org/0000-0001-5036-1929
                Article
                sdata2017203
                10.1038/sdata.2017.203
                5769542
                29337314
                f315c1a1-bf4e-4f10-9dc2-90d86a092f30
                Copyright © 2018, The Author(s)

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files made available in this article.

                History
                : 17 July 2017
                : 13 November 2017
                Categories
                Data Descriptor

                genome,metagenomics,water microbiology,bioinformatics
                genome, metagenomics, water microbiology, bioinformatics

                Comments

                Comment on this article