Analysis of sequencing strategies and tools for taxonomic annotation: Defining standards for progressive metagenomics

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Metagenomics research has recently thrived due to DNA sequencing technologies improvement, driving the emergence of new analysis tools and the growth of taxonomic databases. However, there is no all-purpose strategy that can guarantee the best result for a given project and there are several combinations of software, parameters and databases that can be tested. Therefore, we performed an impartial comparison, using statistical measures of classification for eight bioinformatic tools and four taxonomic databases, defining a benchmark framework to evaluate each tool in a standardized context. Using in silico simulated data for 16S rRNA amplicons and whole metagenome shotgun data, we compared the results from different software and database combinations to detect biases related to algorithms or database annotation. Using our benchmark framework, researchers can define cut-off values to evaluate the expected error rate and coverage for their results, regardless the score used by each software. A quick guide to select the best tool, all datasets and scripts to reproduce our results and benchmark any new method are available at https://github.com/Ales-ibt/Metagenomic-benchmark. Finally, we stress out the importance of gold standards, database curation and manual inspection of taxonomic profiling results, for a better and more accurate microbial diversity description.

Related collections

Most cited references 22

Record: found
Abstract: found
Article: not found

FLASH: fast length adjustment of short reads to improve genome assemblies.

T. Magoc, S. L. Salzberg (2013)

Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome. We present FLASH, a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short. We tested the correctness of the tool on one million simulated read pairs, and we then applied it as a pre-processor for genome assemblies of Illumina reads from the bacterium Staphylococcus aureus and human chromosome 14. FLASH correctly extended and merged reads >99% of the time on simulated reads with an error rate of <1%. With adequately set parameters, FLASH correctly merged reads over 90% of the time even when the reads contained up to 5% errors. When FLASH was used to extend reads prior to assembly, the resulting assemblies had substantially greater N50 lengths for both contigs and scaffolds. The FLASH system is implemented in C and is freely available as open-source code at http://www.cbcb.umd.edu/software/flash. t.magoc@gmail.com.

0 comments Cited 5215 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data

Jaime Huerta-Cepas, François Serra, Peer Bork (2016)

The Environment for Tree Exploration (ETE) is a computational framework that simplifies the reconstruction, analysis, and visualization of phylogenetic trees and multiple sequence alignments. Here, we present ETE v3, featuring numerous improvements in the underlying library of methods, and providing a novel set of standalone tools to perform common tasks in comparative genomics and phylogenetics. The new features include (i) building gene-based and supermatrix-based phylogenies using a single command, (ii) testing and visualizing evolutionary models, (iii) calculating distances between trees of different size or including duplications, and (iv) providing seamless integration with the NCBI taxonomy database. ETE is freely available at http://etetoolkit.org

0 comments Cited 842 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Comparison of the predicted and observed secondary structure of T4 phage lysozyme.

B W Matthews (1975)

Predictions of the secondary structure of T4 phage lysozyme, made by a number of investigators on the basis of the amino acid sequence, are compared with the structure of the protein determined experimentally by X-ray crystallography. Within the amino terminal half of the molecule the locations of helices predicted by a number of methods agree moderately well with the observed structure, however within the carboxyl half of the molecule the overall agreement is poor. For eleven different helix predictions, the coefficients giving the correlation between prediction and observation range from 0.14 to 0.42. The accuracy of the predictions for both beta-sheet regions and for turns are generally lower than for the helices, and in a number of instances the agreement between prediction and observation is no better than would be expected for a random selection of residues. The structural predictions for T4 phage lysozyme are much less successful than was the case for adenylate kinase (Schulz et al. (1974) Nature 250, 140-142). No one method of prediction is clearly superior to all others, and although empirical predictions based on larger numbers of known protein structure tend to be more accurate than those based on a limited sample, the improvement in accuracy is not dramatic, suggesting that the accuracy of current empirical predictive methods will not be substantially increased simply by the inclusion of more data from additional protein structure determinations.

0 comments Cited 669 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Alejandro Sanchez-Flores: alexsf@ibt.unam.mx

Journal

Journal ID (nlm-ta): Sci Rep

Journal ID (iso-abbrev): Sci Rep

Title: Scientific Reports

Publisher: Nature Publishing Group UK (London )

ISSN (Electronic): 2045-2322

Publication date (Electronic): 13 August 2018

Publication date PMC-release: 13 August 2018

Publication date Collection: 2018

Volume: 8

Electronic Location Identifier: 12034

Affiliations

[1 ]ISNI 0000 0001 2159 0001, GRID grid.9486.3, Consorcio de Investigación del Golfo de México (CIGOM), , Instituto de Biotecnología, Universidad Nacional Autónoma de México, ; Cuernvaca, Mexico

[2 ]ISNI 0000 0001 2159 0001, GRID grid.9486.3, Instituto de Biotecnología, , Universidad Nacional Autónoma de México, ; Cuernvaca, Mexico

[3 ]ISNI 0000 0000 9071 1447, GRID grid.462226.6, Departamento de Innovación Biomédica, , CICESE. Carretera Ensenada-Tijuana 3918, Zona Playitas, ; Ensenada, BC Mexico

Author information

Alejandra Escobar-Zepeda http://orcid.org/0000-0003-3549-9115

Elizabeth Ernestina Godoy-Lozano http://orcid.org/0000-0001-6927-9132

Lorenzo Segovia http://orcid.org/0000-0002-4291-4711

Alexei F. Licea-Navarro http://orcid.org/0000-0003-4022-7405

Article

Publisher ID: 30515

DOI: 10.1038/s41598-018-30515-5

PMC ID: 6089906

PubMed ID: 30104688

SO-VID: d7f6b748-9c13-4385-8965-3a1e85bb1173

License:

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

History

Date received : 22 January 2018

Date accepted : 24 July 2018

Funding

Funded by: Salary funded by the National Council of Science and Technology of Mexico - Mexican Ministry of Energy - Hydrocarbon Trust, project 201441. This is a contribution of the Gulf of Mexico Research Consortium (CIGoM).

Custom metadata

ScienceOpen disciplines: Uncategorized

Data availability:

ScienceOpen disciplines: Uncategorized

Comments

Comment on this article

scite_

Cited by 40

See all cited by

- Version 1

Analysis of sequencing strategies and tools for taxonomic annotation: Defining standards for progressive metagenomics

Read this article at

Abstract

Related collections

Taxonomic intelligence

Most cited references 22

FLASH: fast length adjustment of short reads to improve genome assemblies.

ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data

Comparison of the predicted and observed secondary structure of T4 phage lysozyme.

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 124

Cited by 40