Recent analyses of human-associated bacterial diversity have categorized individuals into ‘enterotypes’ or clusters based on the abundances of key bacterial genera in the gut microbiota. There is a lack of consensus, however, on the analytical basis for enterotypes and on the interpretation of these results. We tested how the following factors influenced the detection of enterotypes: clustering methodology, distance metrics, OTU-picking approaches, sequencing depth, data type (whole genome shotgun (WGS) vs.16S rRNA gene sequence data), and 16S rRNA region. We included 16S rRNA gene sequences from the Human Microbiome Project (HMP) and from 16 additional studies and WGS sequences from the HMP and MetaHIT. In most body sites, we observed smooth abundance gradients of key genera without discrete clustering of samples. Some body habitats displayed bimodal ( e.g., gut) or multimodal ( e.g., vagina) distributions of sample abundances, but not all clustering methods and workflows accurately highlight such clusters. Because identifying enterotypes in datasets depends not only on the structure of the data but is also sensitive to the methods applied to identifying clustering strength, we recommend that multiple approaches be used and compared when testing for enterotypes.
Recent work has suggested that individuals can be classified into ‘enterotypes’ based on the abundance of key bacterial taxa in gut microbial communities. However, the generality of enterotypes across populations, and the existence of similar cluster types for other body sites, remains to be evaluated. We combined the Human Microbiome Project 16S rRNA gene sequence data and metagenomes with similar published data to assess the existence of enterotypes across body sites. We found that rather than forming enterotypes (note we use this term for clusters in all body sites), most samples fell into gradients based on taxonomic abundances of bacteria such as Bacteroides, although in some body sites there is a bi/multi modal distribution of samples across gradients. Furthermore, many of the methods used in the analysis ( e.g., distance metrics and clustering approaches) affected the likelihood of identifying enterotypes in particular body habitats. We recommend that multiple approaches be used and compared when testing for enterotypes.