4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Analysis of data dictionary formats of HIV clinical trials

      research-article
      * , ,
      PLoS ONE
      Public Library of Science

      Read this article at

      ScienceOpenPublisherPMC
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Efforts to define research Common Data Elements try to harmonize data collection across clinical studies.

          Objective

          Our goal was to analyze the quality and usability of data dictionaries of HIV studies.

          Methods

          For the clinical domain of HIV, we searched data sharing platforms and acquired a set of 18 HIV related studies from which we analyzed 26 328 data elements. We identified existing standards for creating a data dictionary and reviewed their use. To facilitate aggregation across studies, we defined three types of data dictionary (data element, forms, and permissible values) and created a simple information model for each type.

          Results

          An average study had 427 data elements (ranging from 46 elements to 9 945 elements). In terms of data type, 48.6% of data elements were string, 47.8% were numeric, 3.0% were date and 0.6% were date-time. No study in our sample explicitly declared a data element as a categorical variable and rather considered them either strings or numeric. Only for 61% of studies were we able to obtain permissible values. The majority of studies used CSV files to share a data dictionary while 22% of the studies used a non-computable, PDF format. All studies grouped their data elements. The average number of groups or forms per study was 24 (ranging between 2 and 124 groups/forms). An accurate and well formatted data dictionary facilitates error-free secondary analysis and can help with data de-identification.

          Conclusion

          We saw features of data dictionaries that made them difficult to use and understand. This included multiple data dictionary files or non-machine-readable documents, data elements included in data but not in the dictionary or missing data types or descriptions. Building on experience with aggregating data elements across a large set of studies, we created a set of recommendations (called CONSIDER statement) that can guide optimal data sharing of future studies.

          Related collections

          Most cited references21

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The National Sleep Research Resource: towards a sleep data commons

          Abstract Objective The gold standard for diagnosing sleep disorders is polysomnography, which generates extensive data about biophysical changes occurring during sleep. We developed the National Sleep Research Resource (NSRR), a comprehensive system for sharing sleep data. The NSRR embodies elements of a data commons aimed at accelerating research to address critical questions about the impact of sleep disorders on important health outcomes. Approach We used a metadata-guided approach, with a set of common sleep-specific terms enforcing uniform semantic interpretation of data elements across three main components: (1) annotated datasets; (2) user interfaces for accessing data; and (3) computational tools for the analysis of polysomnography recordings. We incorporated the process for managing dataset-specific data use agreements, evidence of Institutional Review Board review, and the corresponding access control in the NSRR web portal. The metadata-guided approach facilitates structural and semantic interoperability, ultimately leading to enhanced data reusability and scientific rigor. Results The authors curated and deposited retrospective data from 10 large, NIH-funded sleep cohort studies, including several from the Trans-Omics for Precision Medicine (TOPMed) program, into the NSRR. The NSRR currently contains data on 26 808 subjects and 31 166 signal files in European Data Format. Launched in April 2014, over 3000 registered users have downloaded over 130 terabytes of data. Conclusions The NSRR offers a use case and an example for creating a full-fledged data commons. It provides a single point of access to analysis-ready physiological signals from polysomnography obtained from multiple sources, and a wide variety of clinical data to facilitate sleep research.
            • Record: found
            • Abstract: found
            • Article: not found

            Data Sharing

            New England Journal of Medicine, 374(3), 276-277
              • Record: found
              • Abstract: found
              • Article: not found

              Data sharing in the era of COVID-19

              Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) continues to test the capacity of world health systems. Since the outbreak started, the global community has learned about coronavirus disease 2019 (COVID-19), the disease resulting from SARS-CoV-2. In the first few weeks of the pandemic, knowledge about the disease and its treatment was generated from sharing of anecdotal observations and small case series. Although health-care professionals use modern technology to communicate, never before has the failure to build robust data-sharing systems for large-scale near real-time analysis in health care been more obvious. In the era of electronic health records, physiologic, laboratory, imaging, decision-making, and treatment data are continuously recorded. Inferences drawn from these data can inform epidemiological inquiries and guide treatment protocols when clinical trial data do not exist or might be too slow to inform a rapidly evolving situation. While the number of trials increases, real-time treatment data accumulates, siloed within hospital systems. When considering COVID-19, the insight we could gain from a pooled, publicly available dataset analysed by researchers in academic institutes and industry is invaluable and necessary. Unfortunately, patient-level COVID-19 data is not publicly available. These data also lack comprehensive information beyond typical registry resolution. In this interconnected world, we can imagine a unifying multinational COVID-19 electronic health record waiting for global researchers to apply their methodological and domain expertise. No such database exists, and this failing is not rooted in an absence of technology or precedent. Within intensive care, for instance, the Medical Information Mart for Intensive Care (MIMIC) has been a model of publicly-available, deidentified, electronic health record data sharing since 1996.1, 2 Containing approximately 50 000 patient admissions to the Beth Israel Deaconess Medical Center (BIDMC), MIMIC represents the most studied critical care cohort in the world, allowing clinicians and computer scientists to address research questions and build predictive models. 3 MIMIC is evidence of the possibility of data sharing beyond BIDMC's critical care department and hospital. Although the academic community has embraced data monetisation, regulatory hurdles, funding apparatuses, and a publish-or-perish academia at the expense of open data sharing, this short-sightedness does not need to be an undoing. A dreadful unprecedented worldwide event deserves an appropriate response, and this response begins with an extraordinary joining of forces—and data—to best understand the event, and the successes and failures of different treatments. The clinical and academic community will learn many lessons during these turbulent times, and crucially, needs to learn that data on health and disease should be shared universally so that everyone can benefit.

                Author and article information

                Contributors
                Role: Data curationRole: Formal analysisRole: MethodologyRole: Project administrationRole: SoftwareRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: MethodologyRole: SoftwareRole: ValidationRole: Writing – review & editing
                Role: ConceptualizationRole: Formal analysisRole: Funding acquisitionRole: InvestigationRole: MethodologyRole: Project administrationRole: SoftwareRole: SupervisionRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS One
                PLoS One
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                5 October 2020
                2020
                : 15
                : 10
                : e0240047
                Affiliations
                [001]Lister Hill National Center for Biomedical Communication, National Library of Medicine, NIH, Bethesda, MD, United States of America
                Institute of Tropical Medicine (NEKKEN), Nagasaki University, JAPAN
                Author notes

                Competing Interests: NO authors have competing interests

                Author information
                http://orcid.org/0000-0002-2705-9550
                Article
                PONE-D-20-13400
                10.1371/journal.pone.0240047
                7535029
                33017454
                ab459eae-253e-4e7b-9f9d-9c3f0d93ee86

                This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

                History
                : 6 May 2020
                : 17 September 2020
                Page count
                Figures: 0, Tables: 7, Pages: 16
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/100006084, Office of AIDS Research;
                Award Recipient :
                This research was supported by the NIH Office of AIDS Research and Intramural Research Program of the National Institutes of Health (NIH)/National Library of Medicine (NLM)/Lister Hill National Center for Biomedical Communications (LHNCBC) Funder: NIH Office of AIDS Research ( https://www.oar.nih.gov/) The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Microbiology
                Medical Microbiology
                Microbial Pathogens
                Viral Pathogens
                Immunodeficiency Viruses
                HIV
                Medicine and Health Sciences
                Pathology and Laboratory Medicine
                Pathogens
                Microbial Pathogens
                Viral Pathogens
                Immunodeficiency Viruses
                HIV
                Biology and Life Sciences
                Organisms
                Viruses
                Viral Pathogens
                Immunodeficiency Viruses
                HIV
                Biology and Life Sciences
                Organisms
                Viruses
                Immunodeficiency Viruses
                HIV
                Biology and life sciences
                Organisms
                Viruses
                RNA viruses
                Retroviruses
                Lentivirus
                HIV
                Biology and Life Sciences
                Microbiology
                Medical Microbiology
                Microbial Pathogens
                Viral Pathogens
                Retroviruses
                Lentivirus
                HIV
                Medicine and Health Sciences
                Pathology and Laboratory Medicine
                Pathogens
                Microbial Pathogens
                Viral Pathogens
                Retroviruses
                Lentivirus
                HIV
                Biology and Life Sciences
                Organisms
                Viruses
                Viral Pathogens
                Retroviruses
                Lentivirus
                HIV
                Medicine and health sciences
                Medical conditions
                Infectious diseases
                Viral diseases
                HIV infections
                Medicine and health sciences
                Public and occupational health
                Preventive medicine
                HIV prevention
                Computer and Information Sciences
                Data Management
                Metadata
                Medicine and health sciences
                Medical conditions
                Infectious diseases
                Infectious disease control
                Vaccines
                Viral vaccines
                HIV vaccines
                Biology and life sciences
                Microbiology
                Virology
                Viral vaccines
                HIV vaccines
                Medicine and Health Sciences
                Diagnostic Medicine
                Virus Testing
                Medicine and Health Sciences
                Pediatrics
                People and Places
                Population Groupings
                Age Groups
                Adults
                Young Adults
                Custom metadata
                All data underlying our analysis are publicly available at the project repository at https://github.com/lhncbc/CDE/tree/master/hiv/datadictionary.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article

                Related Documents Log