ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Fungi play critical roles in many ecosystems, cause serious diseases in plants and animals, and pose significant threats to human health and structural integrity problems in built environments. While most fungal diversity remains unknown, the development of PCR primers for the internal transcribed spacer (ITS) combined with next-generation sequencing has substantially improved our ability to profile fungal microbial diversity. Although the high sequence variability in the ITS region facilitates more accurate species identification, it also makes multiple sequence alignment and phylogenetic analysis unreliable across evolutionarily distant fungi because the sequences are hard to align accurately. To address this issue, we created ghost-tree, a bioinformatics tool that integrates sequence data from two genetic markers into a single phylogenetic tree that can be used for diversity analyses. Our approach starts with a “foundation” phylogeny based on one genetic marker whose sequences can be aligned across organisms spanning divergent taxonomic groups (e.g., fungal families). Then, “extension” phylogenies are built for more closely related organisms (e.g., fungal species or strains) using a second more rapidly evolving genetic marker. These smaller phylogenies are then grafted onto the foundation tree by mapping taxonomic names such that each corresponding foundation-tree tip would branch into its new “extension tree” child.

Results

We applied ghost-tree to graft fungal extension phylogenies derived from ITS sequences onto a foundation phylogeny derived from fungal 18S sequences. Our analysis of simulated and real fungal ITS data sets found that phylogenetic distances between fungal communities computed using ghost-tree phylogenies explained significantly more variance than non-phylogenetic distances. The phylogenetic metrics also improved our ability to distinguish small differences (effect sizes) between microbial communities, though results were similar to non-phylogenetic methods for larger effect sizes.

Conclusions

The Silva/UNITE-based ghost tree presented here can be easily integrated into existing fungal analysis pipelines to enhance the resolution of fungal community differences and improve understanding of these communities in built environments. The ghost-tree software package can also be used to develop phylogenetic trees for other marker gene sets that afford different taxonomic resolution, or for bridging genome trees with amplicon trees.

Availability

ghost-tree is pip-installable. All source code, documentation, and test code are available under the BSD license at https://github.com/JTFouquier/ghost-tree.

Electronic supplementary material

The online version of this article (doi:10.1186/s40168-016-0153-6) contains supplementary material, which is available to authorized users.

Related collections

Most cited references 27

Record: found
Abstract: found
Article: not found

MUSCLE: multiple sequence alignment with high accuracy and high throughput.

R. C. Edgar (2004)

We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.

0 comments Cited 5980 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

QIIME allows analysis of high-throughput community sequencing data.

J. Gregory Caporaso, Justin Kuczynski, Jesse Stombaugh … (2010)

0 comments Cited 5768 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Book Chapter: not found

AMPLIFICATION AND DIRECT SEQUENCING OF FUNGAL RIBOSOMAL RNA GENES FOR PHYLOGENETICS

T.J White, T. Bruns, S. Lee … (1990)

0 comments Cited 2654 times – based on 0 reviews

Bookmark

All references

Author and article information

Contributors

Jennifer Fouquier: jennietf@gmail.com

Jai Ram Rideout: jai.rideout@gmail.com

Evan Bolyen: ebolyen@gmail.com

John Chase: chasejohnh@gmail.com

Arron Shiffer: shiffy35@gmail.com

Daniel McDonald: wasade@gmail.com

Rob Knight: robknight@ucsd.edu

J Gregory Caporaso: greg.caporaso@gmail.com

Scott T. Kelley: +1 619 206 8014 , skelley@mail.sdsu.edu

Journal

Journal ID (nlm-ta): Microbiome

Journal ID (iso-abbrev): Microbiome

Title: Microbiome

Publisher: BioMed Central (London )

ISSN (Electronic): 2049-2618

Publication date (Electronic): 24 February 2016

Publication date PMC-release: 24 February 2016

Publication date Collection: 2016

Volume: 4

Electronic Location Identifier: 11

Affiliations

[ ]Graduate Program in Bioinformatics and Medical Informatics, San Diego State University, San Diego, CA USA

[ ]Center for Microbial Genetics and Genomics, Northern Arizona University, Flagstaff, AZ USA

[ ]Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ USA

[ ]Institute for Systems Biology, Seattle, WA USA

[ ]Department of Pediatrics, and Department of Computer Science and Engineering, University of California San Diego, San Diego, CA USA

[ ]Department of Biology, San Diego State University, San Diego, CA USA

[ ]San Diego State University, 5500 Campanile Drive, San Diego, CA 92182-4614 USA

Article

Publisher ID: 153

DOI: 10.1186/s40168-016-0153-6

PMC ID: 4765138

PubMed ID: 26905735

SO-VID: 2dfe5f43-a583-4020-940b-acaa4a0ef5db

License:

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

History

Date received : 18 September 2015

Date accepted : 5 February 2016

Funding

Funded by: FundRef http://dx.doi.org/http://dx.doi.org/10.13039/100000879, Alfred P. Sloan Foundation;

Award ID: ᅟ

Award Recipient : Scott T. Kelley

Custom metadata

Data availability:

Comments

Comment on this article

scite_

Cited by 23

See all cited by

Most referenced authors 454

See all reference authors

- Version 1

ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

Read this article at

Abstract

Background

Results

Conclusions

Availability

Electronic supplementary material

Related collections

Tick microbiome

Most cited references 27

MUSCLE: multiple sequence alignment with high accuracy and high throughput.

QIIME allows analysis of high-throughput community sequencing data.

AMPLIFICATION AND DIRECT SEQUENCING OF FUNGAL RIBOSOMAL RNA GENES FOR PHYLOGENETICS

Author and article information

Contributors

Journal

Affiliations

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 70

Cited by 23

Most referenced authors 454