2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      ukbREST: efficient and streamlined data access for reproducible research in large biobanks

      brief-report
      1 , 2 , 1 , 2
      Bioinformatics
      Oxford University Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Summary

          Large biobanks, such as UK Biobank with half a million participants, are changing the scale and availability of genotypic and phenotypic data for researchers to ask fundamental questions about the biology of health and disease. The breadth of the UK Biobank data is enabling discoveries at an unprecedented pace. However, this size and complexity pose new challenges to investigators who need to keep the accruing data up to date, comply with potential consent changes, and efficiently and reproducibly extract subsets of the data to answer specific scientific questions. Here we propose a tool called ukbREST designed for the UK Biobank study (easily extensible to other biobanks), which allows authorized users to efficiently retrieve phenotypic and genetic data. It exposes a REST API that makes data highly accessible inside a private and secure network, allowing the data specification in a human readable text format easily shareable with other researchers. These characteristics make ukbREST an important tool to make biobank’s valuable data more readily accessible to the research community and facilitate reproducibility of the analysis, a key aspect of science.

          Availability and implementation

          It is implemented in Python using the Flask-RESTful framework for the API, and it is under the MIT license. It works with PostgreSQL and a Docker image is available for easy deployment. The source code and documentation is available in Github: https://github.com/hakyimlab/ukbrest.

          Related collections

          Most cited references1

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets

          Background As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. Methods Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required. Results Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample. Conclusions Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics.
            Bookmark

            Author and article information

            Contributors
            Role: Associate Editor
            Journal
            Bioinformatics
            Bioinformatics
            bioinformatics
            Bioinformatics
            Oxford University Press
            1367-4803
            1367-4811
            01 June 2019
            05 November 2018
            05 November 2018
            : 35
            : 11
            : 1971-1973
            Affiliations
            [1 ]Department of Medicine, Section of Genetic Medicine, The University of Chicago, Chicago, IL, USA
            [2 ]Center for Translational Data Science, The University of Chicago, Chicago, IL, USA
            Author notes
            To whom correspondence should be addressed. haky@ 123456uchicago.edu
            Author information
            http://orcid.org/0000-0002-3035-4403
            http://orcid.org/0000-0003-0333-5685
            Article
            bty925
            10.1093/bioinformatics/bty925
            6546122
            30395166
            ce636072-1bbe-45dd-add3-358766ad9fb4
            © The Author(s) 2018. Published by Oxford University Press.

            This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

            History
            : 3 August 2018
            : 26 October 2018
            : 3 November 2018
            Page count
            Pages: 3
            Funding
            Funded by: National Institutes of Health Cloud Credits Model Pilot
            Award ID: R01 MH107666
            Funded by: DRTC 10.13039/100007800
            Award ID: P30 DK020595
            Categories
            Applications Notes
            Genetics and Population Analysis

            Bioinformatics & Computational biology
            Bioinformatics & Computational biology

            Comments

            Comment on this article