11
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A Scalable Data Access Layer to Manage Structured Heterogeneous Biomedical Data

      research-article
      * , , ,
      PLoS ONE
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          This work presents a scalable data access layer, called PyEHR, designed to support the implementation of data management systems for secondary use of structured heterogeneous biomedical and clinical data. PyEHR adopts the openEHR’s formalisms to guarantee the decoupling of data descriptions from implementation details and exploits structure indexing to accelerate searches. Data persistence is guaranteed by a driver layer with a common driver interface. Interfaces for two NoSQL Database Management Systems are already implemented: MongoDB and Elasticsearch. We evaluated the scalability of PyEHR experimentally through two types of tests, called “Constant Load” and “Constant Number of Records”, with queries of increasing complexity on synthetic datasets of ten million records each, containing very complex openEHR archetype structures, distributed on up to ten computing nodes.

          Related collections

          Most cited references19

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Big data analytics in healthcare: promise and potential

          Objective To describe the promise and potential of big data analytics in healthcare. Methods The paper describes the nascent field of big data analytics in healthcare, discusses the benefits, outlines an architectural framework and methodology, describes examples reported in the literature, briefly discusses the challenges, and offers conclusions. Results The paper provides a broad overview of big data analytics for healthcare researchers and practitioners. Conclusions Big data analytics in healthcare is evolving into a promising field for providing insight from very large data sets and improving outcomes while reducing costs. Its potential is great; however there remain challenges to overcome.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Biology: The big challenges of big data.

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Sequencing technologies and genome sequencing

              The high-throughput - next generation sequencing (HT-NGS) technologies are currently the hottest topic in the field of human and animals genomics researches, which can produce over 100 times more data compared to the most sophisticated capillary sequencers based on the Sanger method. With the ongoing developments of high throughput sequencing machines and advancement of modern bioinformatics tools at unprecedented pace, the target goal of sequencing individual genomes of living organism at a cost of $1,000 each is seemed to be realistically feasible in the near future. In the relatively short time frame since 2005, the HT-NGS technologies are revolutionizing the human and animal genome researches by analysis of chromatin immunoprecipitation coupled to DNA microarray (ChIP-chip) or sequencing (ChIP-seq), RNA sequencing (RNA-seq), whole genome genotyping, genome wide structural variation, de novo assembling and re-assembling of genome, mutation detection and carrier screening, detection of inherited disorders and complex human diseases, DNA library preparation, paired ends and genomic captures, sequencing of mitochondrial genome and personal genomics. In this review, we addressed the important features of HT-NGS like, first generation DNA sequencers, birth of HT-NGS, second generation HT-NGS platforms, third generation HT-NGS platforms: including single molecule Heliscope™, SMRT™ and RNAP sequencers, Nanopore, Archon Genomics X PRIZE foundation, comparison of second and third HT-NGS platforms, applications, advances and future perspectives of sequencing technologies on human and animal genome research.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                2016
                9 December 2016
                : 11
                : 12
                : e0168004
                Affiliations
                [001]Data-Intensive Computing Group, CRS4, Pula, Italy
                West Virginia University, UNITED STATES
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                • Conceptualization: GZ GD LL FF.

                • Data curation: GD LL.

                • Formal analysis: GD LL GZ.

                • Investigation: GD LL.

                • Methodology: GZ GD LL FF.

                • Project administration: GZ.

                • Resources: GZ GD LL FF.

                • Software: GD LL.

                • Supervision: GZ.

                • Validation: GD LL.

                • Visualization: GD.

                • Writing – original draft: GD.

                • Writing – review & editing: GD GZ.

                Author information
                http://orcid.org/0000-0002-1023-2257
                Article
                PONE-D-16-31189
                10.1371/journal.pone.0168004
                5148592
                27936191
                6c1e7561-964e-42a7-b1ca-6155926853ac
                © 2016 Delussu et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 4 August 2016
                : 24 November 2016
                Page count
                Figures: 39, Tables: 0, Pages: 38
                Funding
                The authors received no specific funding for this work.
                Categories
                Research Article
                Computer and Information Sciences
                Information Technology
                Databases
                Computer and Information Sciences
                Data Management
                Engineering and Technology
                Structural Engineering
                Built Structures
                Biology and Life Sciences
                Computational Biology
                Biological Data Management
                Computer and Information Sciences
                Information Technology
                Databases
                Relational Databases
                Computer and Information Sciences
                Data Visualization
                Infographics
                Graphs
                Computer and Information Sciences
                Programming Languages
                Social Sciences
                Linguistics
                Phonology
                Syntax
                Custom metadata
                PyEHR described in this manuscript is available in github: https://github.com/crs4/pyEHR To replicate the results the datasets can be found along with an explanation at: ftp://ftp.crs4.it/surfer/public/PYEHR_TESTING_DATA/.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article