Morgan G. I. Langille 1 , Jesse Zaneveld 2 , J. Gregory Caporaso 3 , 4 , Daniel McDonald 5 , 6 , Dan Knights 7 , 8 , Joshua A. Reyes 9 , Jose C. Clemente 10 , Deron E. Burkepile 11 , Rebecca L. Vega Thurber 2 , Rob Knight 10 , 12 , Robert G. Beiko 1 , Curtis Huttenhower 9 , 13
25 August 2013
Profiling phylogenetic marker genes, such as the 16S rRNA gene, is a key tool for studies of microbial communities but does not provide direct evidence of a community’s functional capabilities. Here we describe PICRUSt (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States), a computational approach to predict the functional composition of a metagenome using marker gene data and a database of reference genomes. PICRUSt uses an extended ancestral-state reconstruction algorithm to predict which gene families are present and then combines gene families to estimate the composite metagenome. Using 16S information, PICRUSt recaptures key findings from the Human Microbiome Project and accurately predicts the abundance of gene families in host-associated and environmental communities, with quantifiable uncertainty. Our results demonstrate that phylogeny and function are sufficiently linked that this ‘predictive metagenomic’ approach should provide useful insights into the thousands of uncultivated microbial communities for which only marker gene surveys are currently available.