16
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      BGDMdocker: a Docker workflow for data mining and visualization of bacterial pan-genomes and biosynthetic gene clusters

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily.

          Related collections

          Most cited references9

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          panX: pan-genome analysis and exploration

          Abstract Horizontal transfer, gene loss, and duplication result in dynamic bacterial genomes shaped by a complex mixture of different modes of evolution. Closely related strains can differ in the presence or absence of many genes, and the total number of distinct genes found in a set of related isolates—the pan-genome—is often many times larger than the genome of individual isolates. We have developed a pipeline that efficiently identifies orthologous gene clusters in the pan-genome. This pipeline is coupled to a powerful yet easy-to-use web-based visualization for interactive exploration of the pan-genome. The visualization consists of connected components that allow rapid filtering and searching of genes and inspection of their evolutionary history. For each gene cluster, panX displays an alignment, a phylogenetic tree, maps mutations within that cluster to the branches of the tree and infers gain and loss of genes on the core-genome phylogeny. PanX is available at pangenome.de. Custom pan-genomes can be visualized either using a web server or by serving panX locally as a browser-based application.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            An introduction to Docker for reproducible research, with examples from the R environment

            As computational work becomes more and more integral to many aspects of scientific research, computational reproducibility has become an issue of increasing importance to computer systems researchers and domain scientists alike. Though computational reproducibility seems more straight forward than replicating physical experiments, the complex and rapidly changing nature of computer environments makes being able to reproduce and extend such work a serious challenge. In this paper, I explore common reasons that code developed for one research project cannot be successfully executed or extended by subsequent researchers. I review current approaches to these issues, including virtual machines and workflow systems, and their limitations. I then examine how the popular emerging technology Docker combines several areas from systems research - such as operating system virtualization, cross-platform portability, modular re-usable elements, versioning, and a `DevOps' philosophy, to address these challenges. I illustrate this with several examples of Docker use with a focus on the R statistical environment.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The impact of Docker containers on the performance of genomic pipelines

              Genomic pipelines consist of several pieces of third party software and, because of their experimental nature, frequent changes and updates are commonly necessary thus raising serious deployment and reproducibility issues. Docker containers are emerging as a possible solution for many of these problems, as they allow the packaging of pipelines in an isolated and self-contained manner. This makes it easy to distribute and execute pipelines in a portable manner across a wide range of computing platforms. Thus, the question that arises is to what extent the use of Docker containers might affect the performance of these pipelines. Here we address this question and conclude that Docker containers have only a minor impact on the performance of common genomic pipelines, which is negligible when the executed jobs are long in terms of computational time.
                Bookmark

                Author and article information

                Contributors
                Journal
                PeerJ
                PeerJ
                PeerJ
                PeerJ
                PeerJ
                PeerJ Inc. (San Francisco, USA )
                2167-8359
                30 November 2017
                2017
                : 5
                : e3948
                Affiliations
                [1 ] Protection Research Center of Pomology, Research Institute of Pomology, Chinese Academy of Agricultural Sciences , Xingcheng, China
                [2 ] Forest Protection Research Institute of Heilongjiang Province , Harbin, China
                [3 ] Research Institute of Forest Ecology, Environment and Protection, Chinese Academy of Forestry , Beijing, China
                [4 ] School of Forestry, Northeast Forestry University , Harbin, China
                [5 ] Institute of Food Science and Technology, Chinese Academy of Agricultural Sciences , Beijing, China
                Article
                3948
                10.7717/peerj.3948
                5712467
                d6536fa4-8787-4233-a998-848f1fc17b6a
                © 2017 Cheng et al.

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

                History
                : 21 June 2017
                : 30 September 2017
                Funding
                Funded by: National Key R&D Program of China
                Award ID: 2017YFD0600103
                Funded by: Fundamental Research Fund for Central Non-profit Scientific Institution
                Award ID: Y2016PT38
                Funded by: CAAS-ASTIP
                This work was supported by the National Key R&D Program of China (2017YFD0600103), Fundamental Research Fund for Central Non-profit Scientific Institution (Y2016PT38), and CAAS-ASTIP. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Bioinformatics
                Computational Biology
                Genomics
                Microbiology

                docker,pan-genome,biosynthetic gene clusters,bacillus amyloliquefaciens

                Comments

                Comment on this article