Alejandra Escobar-Zepeda 1 , Elizabeth Ernestina Godoy-Lozano 1 , Luciana Raggi 1 , Lorenzo Segovia 1 , 2 , Enrique Merino 1 , 2 , Rosa María Gutiérrez-Rios 1 , 2 , Katy Juarez 1 , 2 , Alexei F. Licea-Navarro 1 , 3 , Liliana Pardo-Lopez 1 , 2 , Alejandro Sanchez-Flores , 1 , 2
13 August 2018
Metagenomics research has recently thrived due to DNA sequencing technologies improvement, driving the emergence of new analysis tools and the growth of taxonomic databases. However, there is no all-purpose strategy that can guarantee the best result for a given project and there are several combinations of software, parameters and databases that can be tested. Therefore, we performed an impartial comparison, using statistical measures of classification for eight bioinformatic tools and four taxonomic databases, defining a benchmark framework to evaluate each tool in a standardized context. Using in silico simulated data for 16S rRNA amplicons and whole metagenome shotgun data, we compared the results from different software and database combinations to detect biases related to algorithms or database annotation. Using our benchmark framework, researchers can define cut-off values to evaluate the expected error rate and coverage for their results, regardless the score used by each software. A quick guide to select the best tool, all datasets and scripts to reproduce our results and benchmark any new method are available at https://github.com/Ales-ibt/Metagenomic-benchmark. Finally, we stress out the importance of gold standards, database curation and manual inspection of taxonomic profiling results, for a better and more accurate microbial diversity description.