28
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Fungi play critical roles in many ecosystems, cause serious diseases in plants and animals, and pose significant threats to human health and structural integrity problems in built environments. While most fungal diversity remains unknown, the development of PCR primers for the internal transcribed spacer (ITS) combined with next-generation sequencing has substantially improved our ability to profile fungal microbial diversity. Although the high sequence variability in the ITS region facilitates more accurate species identification, it also makes multiple sequence alignment and phylogenetic analysis unreliable across evolutionarily distant fungi because the sequences are hard to align accurately. To address this issue, we created ghost-tree, a bioinformatics tool that integrates sequence data from two genetic markers into a single phylogenetic tree that can be used for diversity analyses. Our approach starts with a “foundation” phylogeny based on one genetic marker whose sequences can be aligned across organisms spanning divergent taxonomic groups (e.g., fungal families). Then, “extension” phylogenies are built for more closely related organisms (e.g., fungal species or strains) using a second more rapidly evolving genetic marker. These smaller phylogenies are then grafted onto the foundation tree by mapping taxonomic names such that each corresponding foundation-tree tip would branch into its new “extension tree” child.

          Results

          We applied ghost-tree to graft fungal extension phylogenies derived from ITS sequences onto a foundation phylogeny derived from fungal 18S sequences. Our analysis of simulated and real fungal ITS data sets found that phylogenetic distances between fungal communities computed using ghost-tree phylogenies explained significantly more variance than non-phylogenetic distances. The phylogenetic metrics also improved our ability to distinguish small differences (effect sizes) between microbial communities, though results were similar to non-phylogenetic methods for larger effect sizes.

          Conclusions

          The Silva/UNITE-based ghost tree presented here can be easily integrated into existing fungal analysis pipelines to enhance the resolution of fungal community differences and improve understanding of these communities in built environments. The ghost-tree software package can also be used to develop phylogenetic trees for other marker gene sets that afford different taxonomic resolution, or for bridging genome trees with amplicon trees.

          Availability

          ghost-tree is pip-installable. All source code, documentation, and test code are available under the BSD license at https://github.com/JTFouquier/ghost-tree.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s40168-016-0153-6) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references27

          • Record: found
          • Abstract: found
          • Article: not found

          MUSCLE: multiple sequence alignment with high accuracy and high throughput.

          We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            QIIME allows analysis of high-throughput community sequencing data.

              Bookmark
              • Record: found
              • Abstract: not found
              • Book Chapter: not found

              AMPLIFICATION AND DIRECT SEQUENCING OF FUNGAL RIBOSOMAL RNA GENES FOR PHYLOGENETICS

                Bookmark

                Author and article information

                Contributors
                jennietf@gmail.com
                jai.rideout@gmail.com
                ebolyen@gmail.com
                chasejohnh@gmail.com
                shiffy35@gmail.com
                wasade@gmail.com
                robknight@ucsd.edu
                greg.caporaso@gmail.com
                +1 619 206 8014 , skelley@mail.sdsu.edu
                Journal
                Microbiome
                Microbiome
                Microbiome
                BioMed Central (London )
                2049-2618
                24 February 2016
                24 February 2016
                2016
                : 4
                : 11
                Affiliations
                [ ]Graduate Program in Bioinformatics and Medical Informatics, San Diego State University, San Diego, CA USA
                [ ]Center for Microbial Genetics and Genomics, Northern Arizona University, Flagstaff, AZ USA
                [ ]Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ USA
                [ ]Institute for Systems Biology, Seattle, WA USA
                [ ]Department of Pediatrics, and Department of Computer Science and Engineering, University of California San Diego, San Diego, CA USA
                [ ]Department of Biology, San Diego State University, San Diego, CA USA
                [ ]San Diego State University, 5500 Campanile Drive, San Diego, CA 92182-4614 USA
                Article
                153
                10.1186/s40168-016-0153-6
                4765138
                26905735
                2dfe5f43-a583-4020-940b-acaa4a0ef5db
                © Fouquier et al. 2016

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 18 September 2015
                : 5 February 2016
                Funding
                Funded by: FundRef http://dx.doi.org/http://dx.doi.org/10.13039/100000879, Alfred P. Sloan Foundation;
                Award ID: ᅟ
                Award Recipient :
                Categories
                Software
                Custom metadata
                © The Author(s) 2016

                Comments

                Comment on this article