14
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Analysis of sequencing strategies and tools for taxonomic annotation: Defining standards for progressive metagenomics

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Metagenomics research has recently thrived due to DNA sequencing technologies improvement, driving the emergence of new analysis tools and the growth of taxonomic databases. However, there is no all-purpose strategy that can guarantee the best result for a given project and there are several combinations of software, parameters and databases that can be tested. Therefore, we performed an impartial comparison, using statistical measures of classification for eight bioinformatic tools and four taxonomic databases, defining a benchmark framework to evaluate each tool in a standardized context. Using in silico simulated data for 16S rRNA amplicons and whole metagenome shotgun data, we compared the results from different software and database combinations to detect biases related to algorithms or database annotation. Using our benchmark framework, researchers can define cut-off values to evaluate the expected error rate and coverage for their results, regardless the score used by each software. A quick guide to select the best tool, all datasets and scripts to reproduce our results and benchmark any new method are available at https://github.com/Ales-ibt/Metagenomic-benchmark. Finally, we stress out the importance of gold standards, database curation and manual inspection of taxonomic profiling results, for a better and more accurate microbial diversity description.

          Related collections

          Most cited references22

          • Record: found
          • Abstract: found
          • Article: not found

          FLASH: fast length adjustment of short reads to improve genome assemblies.

          Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome. We present FLASH, a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short. We tested the correctness of the tool on one million simulated read pairs, and we then applied it as a pre-processor for genome assemblies of Illumina reads from the bacterium Staphylococcus aureus and human chromosome 14. FLASH correctly extended and merged reads >99% of the time on simulated reads with an error rate of <1%. With adequately set parameters, FLASH correctly merged reads over 90% of the time even when the reads contained up to 5% errors. When FLASH was used to extend reads prior to assembly, the resulting assemblies had substantially greater N50 lengths for both contigs and scaffolds. The FLASH system is implemented in C and is freely available as open-source code at http://www.cbcb.umd.edu/software/flash. t.magoc@gmail.com.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data

            The Environment for Tree Exploration (ETE) is a computational framework that simplifies the reconstruction, analysis, and visualization of phylogenetic trees and multiple sequence alignments. Here, we present ETE v3, featuring numerous improvements in the underlying library of methods, and providing a novel set of standalone tools to perform common tasks in comparative genomics and phylogenetics. The new features include (i) building gene-based and supermatrix-based phylogenies using a single command, (ii) testing and visualizing evolutionary models, (iii) calculating distances between trees of different size or including duplications, and (iv) providing seamless integration with the NCBI taxonomy database. ETE is freely available at http://etetoolkit.org
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Comparison of the predicted and observed secondary structure of T4 phage lysozyme.

              Predictions of the secondary structure of T4 phage lysozyme, made by a number of investigators on the basis of the amino acid sequence, are compared with the structure of the protein determined experimentally by X-ray crystallography. Within the amino terminal half of the molecule the locations of helices predicted by a number of methods agree moderately well with the observed structure, however within the carboxyl half of the molecule the overall agreement is poor. For eleven different helix predictions, the coefficients giving the correlation between prediction and observation range from 0.14 to 0.42. The accuracy of the predictions for both beta-sheet regions and for turns are generally lower than for the helices, and in a number of instances the agreement between prediction and observation is no better than would be expected for a random selection of residues. The structural predictions for T4 phage lysozyme are much less successful than was the case for adenylate kinase (Schulz et al. (1974) Nature 250, 140-142). No one method of prediction is clearly superior to all others, and although empirical predictions based on larger numbers of known protein structure tend to be more accurate than those based on a limited sample, the improvement in accuracy is not dramatic, suggesting that the accuracy of current empirical predictive methods will not be substantially increased simply by the inclusion of more data from additional protein structure determinations.
                Bookmark

                Author and article information

                Contributors
                alexsf@ibt.unam.mx
                Journal
                Sci Rep
                Sci Rep
                Scientific Reports
                Nature Publishing Group UK (London )
                2045-2322
                13 August 2018
                13 August 2018
                2018
                : 8
                : 12034
                Affiliations
                [1 ]ISNI 0000 0001 2159 0001, GRID grid.9486.3, Consorcio de Investigación del Golfo de México (CIGOM), , Instituto de Biotecnología, Universidad Nacional Autónoma de México, ; Cuernvaca, Mexico
                [2 ]ISNI 0000 0001 2159 0001, GRID grid.9486.3, Instituto de Biotecnología, , Universidad Nacional Autónoma de México, ; Cuernvaca, Mexico
                [3 ]ISNI 0000 0000 9071 1447, GRID grid.462226.6, Departamento de Innovación Biomédica, , CICESE. Carretera Ensenada-Tijuana 3918, Zona Playitas, ; Ensenada, BC Mexico
                Author information
                http://orcid.org/0000-0003-3549-9115
                http://orcid.org/0000-0001-6927-9132
                http://orcid.org/0000-0002-4291-4711
                http://orcid.org/0000-0003-4022-7405
                Article
                30515
                10.1038/s41598-018-30515-5
                6089906
                30104688
                d7f6b748-9c13-4385-8965-3a1e85bb1173
                © The Author(s) 2018

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 22 January 2018
                : 24 July 2018
                Funding
                Funded by: Salary funded by the National Council of Science and Technology of Mexico - Mexican Ministry of Energy - Hydrocarbon Trust, project 201441. This is a contribution of the Gulf of Mexico Research Consortium (CIGoM).
                Categories
                Article
                Custom metadata
                © The Author(s) 2018

                Uncategorized
                Uncategorized

                Comments

                Comment on this article