128
views
0
recommends
+1 Recommend
0 collections
    4
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Phymm and PhymmBL: Metagenomic Phylogenetic Classification with Interpolated Markov Models

      research-article
      1 ,   1
      Nature methods

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Metagenomics projects collect DNA from uncharacterized environments that may contain thousands of species per sample. One main challenge facing metagenomic analysis is phylogenetic classification of raw sequence reads into groups representing the same or similar taxa, a prerequisite for genome assembly and for analyzing the biological diversity of a sample. New sequencing technologies have made metagenomics easier, by making sequencing faster, and more difficult, by producing shorter reads than previous technologies. Classifying sequences from reads as short as 100 base pairs has until now been relatively inaccurate, requiring researchers to use older, long-read technologies. We present Phymm, a classifier for metagenomic data, that has been trained on 539 complete, curated genomes and can accurately classify reads as short as 100 bp, representing a substantial leap forward over previous composition-based classification methods. We also describe how combining Phymm with sequence alignment algorithms, further improves accuracy.

          Related collections

          Most cited references17

          • Record: found
          • Abstract: found
          • Article: not found

          Community structure and metabolism through reconstruction of microbial genomes from the environment.

          Microbial communities are vital in the functioning of all ecosystems; however, most microorganisms are uncultivated, and their roles in natural systems are unclear. Here, using random shotgun sequencing of DNA from a natural acidophilic biofilm, we report reconstruction of near-complete genomes of Leptospirillum group II and Ferroplasma type II, and partial recovery of three other genomes. This was possible because the biofilm was dominated by a small number of species populations and the frequency of genomic rearrangements and gene insertions or deletions was relatively low. Because each sequence read came from a different individual, we could determine that single-nucleotide polymorphisms are the predominant form of heterogeneity at the strain level. The Leptospirillum group II genome had remarkably few nucleotide polymorphisms, despite the existence of low-abundance variants. The Ferroplasma type II genome seems to be a composite from three ancestral strains that have undergone homologous recombination to form a large population of mosaic genomes. Analysis of the gene complement for each organism revealed the pathways for carbon and nitrogen fixation and energy generation, and provided insights into survival strategies in an extreme environment.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Comparative metagenomics of microbial communities.

            The species complexity of microbial communities and challenges in culturing representative isolates make it difficult to obtain assembled genomes. Here we characterize and compare the metabolic capabilities of terrestrial and marine microbial communities using largely unassembled sequence data obtained by shotgun sequencing DNA isolated from the various environments. Quantitative gene content analysis reveals habitat-specific fingerprints that reflect known characteristics of the sampled environments. The identification of environment-specific genes through a gene-centric comparative analysis presents new opportunities for interpreting and diagnosing environments.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Using MUMmer to identify similar regions in large sequence sets.

              The MUMmer sequence alignment package is a suite of computer programs designed to detect regions of homology in long biological sequences. Version 2.1 makes several improvements to the package, including: increased speed and reduced memory requirements; the ability to handle both protein and DNA sequences; the ability to handle multiple sequence fragments; and new algorithms for clustering together basic matches. The system is particularly efficient at comparing highly similar sequences, such as alternative versions of fragment assemblies or closely related strains of the same bacterium.
                Bookmark

                Author and article information

                Journal
                101215604
                32338
                Nat Methods
                Nature methods
                1548-7091
                1548-7105
                14 July 2009
                2 August 2009
                September 2009
                1 March 2010
                : 6
                : 9
                : 673-676
                Affiliations
                [1 ] Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742
                Author notes
                Article
                nihpa128792
                10.1038/nmeth.1358
                2762791
                19648916
                95116fa8-227b-48ed-9870-bc9ef60b4bd1
                History
                Funding
                Funded by: National Library of Medicine : NLM
                Funded by: National Institute of General Medical Sciences : NIGMS
                Award ID: R01 LM006845-10 ||LM
                Funded by: National Library of Medicine : NLM
                Funded by: National Institute of General Medical Sciences : NIGMS
                Award ID: R01 GM083873-06 ||GM
                Categories
                Article

                Life sciences
                Life sciences

                Comments

                Comment on this article