13
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      The application of Hadoop in Structural Bioinformatics

      Preprint
      ,
      bioRxiv

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          This paper reviews the use of the Hadoop platform in Structural Bioinformatics applications. Specifically, we review a number of implementations using Hadoop of high-throughput analyses, e.g. ligand-protein docking and structural alignment, and their scalability in comparison with other batch schedulers and MPI. We find that these deployments for the most part use known executables called from MapReduce rather than rewriting the algorithms. The scalability exhibits a variable behaviour in comparison with other batch schedulers, particularly as direct comparisons on the same platform are generally not available. We do note there is some evidence that MPI implementations scale better than Hadoop. A significant barrier to the use of the Hadoop ecosystem is the difficulty of the interface and configuration of a resource to use Hadoop. This will improve over time as interfaces to Hadoop e.g. Spark improve, usage of cloud platforms (e.g. Azure and AWS) increases and approaches such as the Workflow Definition Language are taken up.

          Related collections

          Author and article information

          Journal
          bioRxiv
          July 25 2018
          Article
          10.1101/376467
          7836bf1c-e75d-430d-9e7e-09b3fbc49886
          © 2018
          History

          Quantitative & Systems biology,Biophysics
          Quantitative & Systems biology, Biophysics

          Comments

          Comment on this article