23
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Bioinformatics Workflows With NoSQL Database in Cloud Computing

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Scientific workflows can be understood as arrangements of managed activities executed by different processing entities. It is a regular Bioinformatics approach applying workflows to solve problems in Molecular Biology, notably those related to sequence analyses. Due to the nature of the raw data and the in silico environment of Molecular Biology experiments, apart from the research subject, 2 practical and closely related problems have been studied: reproducibility and computational environment. When aiming to enhance the reproducibility of Bioinformatics experiments, various aspects should be considered. The reproducibility requirements comprise the data provenance, which enables the acquisition of knowledge about the trajectory of data over a defined workflow, the settings of the programs, and the entire computational environment. Cloud computing is a booming alternative that can provide this computational environment, hiding technical details, and delivering a more affordable, accessible, and configurable on-demand environment for researchers. Considering this specific scenario, we proposed a solution to improve the reproducibility of Bioinformatics workflows in a cloud computing environment using both Infrastructure as a Service (IaaS) and Not only SQL (NoSQL) database systems. To meet the goal, we have built 3 typical Bioinformatics workflows and ran them on 1 private and 2 public clouds, using different types of NoSQL database systems to persist the provenance data according to the Provenance Data Model (PROV-DM). We present here the results and a guide for the deployment of a cloud environment for Bioinformatics exploring the characteristics of various NoSQL database systems to persist provenance data.

          Related collections

          Most cited references59

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Trimmomatic: a flexible trimmer for Illumina sequence data

          Motivation: Although many next-generation sequencing (NGS) read preprocessing tools already existed, we could not find any tool or combination of tools that met our requirements in terms of flexibility, correct handling of paired-end data and high performance. We have developed Trimmomatic as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data. Results: The value of NGS read preprocessing is demonstrated for both reference-based and reference-free tasks. Trimmomatic is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested. Availability and implementation: Trimmomatic is licensed under GPL V3. It is cross-platform (Java 1.5+ required) and available at http://www.usadellab.org/cms/index.php?page=trimmomatic Contact: usadel@bio1.rwth-aachen.de Supplementary information: Supplementary data are available at Bioinformatics online.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The Sequence Alignment/Map format and SAMtools

            Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Basic local alignment search tool.

              A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.
                Bookmark

                Author and article information

                Journal
                Evol Bioinform Online
                Evol. Bioinform. Online
                EVB
                spevb
                Evolutionary Bioinformatics Online
                SAGE Publications (Sage UK: London, England )
                1176-9343
                5 December 2019
                2019
                : 15
                : 1176934319889974
                Affiliations
                [1 ]Department of Computer Science, University of Brasília, Brasília, Brazil
                [2 ]NEPBIO (Group of Biological Studies and Research on Cerrado), Federal Institute of Goiás (IFG), Formosa, Goiás, Brazil
                [3 ]Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Brazil
                Author notes
                [*]Polyane Wercelens, University of Brasília, Brasília 70910-900, Brazil. Email: polyane.wercelens@ 123456gmail.com
                Author information
                https://orcid.org/0000-0002-8494-1267
                https://orcid.org/0000-0002-8660-6331
                https://orcid.org/0000-0003-3073-3734
                https://orcid.org/0000-0002-0883-2579
                Article
                10.1177_1176934319889974
                10.1177/1176934319889974
                6896126
                31839702
                66c42b96-abda-426a-940d-ace260ceb8c4
                © The Author(s) 2019

                This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License ( http://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages ( https://us.sagepub.com/en-us/nam/open-access-at-sage).

                History
                : 27 October 2019
                : 29 October 2019
                Categories
                Original Research
                Custom metadata
                January-December 2019
                ts1

                Bioinformatics & Computational biology
                bioinformatics workflows,reproducibility,data provenance,cloud computing,nosql

                Comments

                Comment on this article