Summary box
The sharing of health data, including clinical trial data, is required more and more
often by research publishers, regulatory agencies, ethics committees and funding bodies.
Despite these requirements, there are currently no clear standards and guidelines
of how, where and when researchers should share their data.
The confusion among researchers regarding issues related to data sharing has led funders
such as The European and Developing Countries Clinical Trials Partnership (EDCTP)
to devise initiatives that will provide their grantees, and the wider scientific community
within the field of global health research, with clear guidance and a range of tools
to facilitate the data sharing process.
In an effort to support and facilitate data sharing, the EDCTP is working in collaboration
with The Global Health Network to assess whether a cross-cutting knowledge hub around
data sharing would help researchers find the optimum repository and to gather their
data in a form that is ready for sharing.
Over the past several years, we have seen a movement towards a more open way of conducting
science, with recommendations that ought to lead to reproducible methods, analyses
and results, as well as reusable data. Data sharing is widely encouraged and its importance
has been noted in the context of health data, including clinical trials.1 2 It is
now a standard requirement by publishers, research institutions and regulatory agencies
to share data. Many types of health data are increasingly viewed as global public
goods that should be made available to the wider scientific community without unnecessary
delays, ensuring important findings can be extracted as soon as possible.3 4 Major
funders such as the European Commission, National Institutes of Health, Wellcome Trust,
Bill & Melinda Gates Foundation and The European and Developing Countries Clinical
Trials Partnership (EDCTP; www.edctp.org) are imposing contractual obligations on
their grantees to share their data for free and, ideally, without imposing unnecessary
barriers on data accessibility (ie, open access or appropriate controlled access5).
In the current landscape of global health research, data sharing, including sharing
of clinical data collected during routine patient care, clinical data collected by
clinical trials, as well as metadata, has therefore become a simple necessity.
Not as easy as it sounds
Despite these requirements, sharing of health data in a meaningful manner is neither
straightforward nor commonplace. We suggest that this is at least partially due to
the lack of clear standards and established guidelines explaining where, when and
how to share data. We carried out a gap analysis in order to assess the needs of researchers,
as well as the resources and training available to them. Our approach was threefold.
(1) We used specialist web browsing software to carry out a comprehensive audit of
online training courses, learning materials and educational videos related to data
sharing in health research by querying Bing, Exalead, Google, Yahoo and YouTube. (2)
We conducted a workshop on data sharing and obtained feedback from the attendees regarding
their training needs in this field. (The workshop was organised by The Global Health
Network in collaboration with the Infectious Diseases Data Observatory and carried
out during the EDCTP Ninth Forum (17–21 September 2018) in Lisbon, Portugal.) (3)
We investigated repository availability and their characteristics in order to develop
a tool that will guide researchers to repositories appropriate for their datasets.
Data sharing is complicated and costly in terms of time, effort, expertise and resources.
There are of course other obstacles, including concerns about data sensitivity and
patient privacy, as well as the technical aspects of data processing before the data
can be shared.2–4 6 7
These challenges hold true across multiple contexts, for example, not just among researchers
but also the public8, in high-income settings as well as low-income and middle-income
settings. Overall inequality in health data can be linked to poverty,9 and similarly
data sharing may be particularly challenging for researchers in low-income and middle-income
countries (LMICs).10 For instance, inequities exist between high-income countries
(HICs) and LMICs when it comes to data ownership and reuse.11 One of the main concerns
of primary researchers is that while they spend time and effort collecting and sharing
data, secondary researchers will focus on reusing these data and reaping the benefits,
potentially without proper acknowledgment of the primary researchers, and without
having contributed to the costs of data generation and processing.12 13 Furthermore,
LMIC researchers may not even be able to access outputs of such secondary analyses
produced using their own data, particularly if these are published behind a paywall
in a HIC, and thus their communities will not be able to benefit from the advancements.
Moreover, LMIC researchers will likely be responsible for the necessary community
engagement and any ethical concerns of their study participants relating to informed
consent and data sharing. On top all of these challenges, LMICs also face problems
of limited resources and difficulties in accessing the training necessary to build
research capacity for data management, processing, analysis and sharing.11 14–17
The nature of working with data is changing at an unprecedented rate due to advancements
in technology and analytics techniques.18 19 Therefore, it is not sufficient to simply
require data to be shared, without providing guidance and assistance with the process,
especially if the objective is to share the data in a responsible and useful way.
Yet, we struggled to find organisations that provide tools and resources necessary
to fulfil their requirements of data sharing. Furthermore, the situation is not helped
by the lack of follow-up from the organisations requiring that data are shared. Given
that there are few incentives and multiple barriers to data sharing, regardless of
whether these incentives and barriers are actual or perceived, as well as lack of
support and, ultimately, of consequences, perhaps it is not surprising that data sharing
has not been taken up more quickly.
What can we do?
Some of the existing initiatives supporting data sharing include platforms providing
advice, such as the Digital Curation Centre (http://www.dcc.ac.uk), the Research Data
Alliance (https://rd-alliance.org) and Chatham House’s guide to sharing health surveillance
data (https://datasharing.chathamhouse.org), repositories where data can be archived
(with re3data.org collating multiple repositories), consortia working on standards
supporting interoperability between different systems (eg, the Clinical Data Interchange
Standards Consortium (https://www.cdisc.org)), groups developing tools for specific
diseases (eg, Malaria Toolkit (Infectious Diseases Data Observatory; https://www.wwarn.org/tools-resources/malaria-clinical-trials-toolkit),
Ebola Data Tools (ISARIC; https://isaric.tghn.org/protocols/ebola-data-tools/), Zika
Research Tools (ISARIC, PREPARE Europe, and partners; https://zikainfection.tghn.org/research-tools-and-resources))
and trial registries facilitating discovery of the data sets such as ClinicalTrials.gov,
ISRCTN (http://www.isrctn.com) and the EU Clinical Trials Register (https://www.clinicaltrialsregister.eu).
We believe that to address the conceptual difficulties, as well as the legal and ethical
concerns, clear and concise information explaining the terminology, funder requirements
and policies and the core components of the process of data sharing ought to be easily
accessible in a central knowledge hub that is relevant for various health-related
data and for a range of study types (from observational to clinical trials), regions
and organisations. Bringing the information together has the clear advantage of saving
the researcher and/or data manager’s time that would otherwise be spent on searching
through multiple guides/websites/protocols. Duplication of content should be avoided
where practical, for instance, through providing an overview of the issues and signposting
to further, more detailed resources. Data sharing should also be put in context—it
should be considered throughout the life of the project rather than treated as an
afterthought. The resources should reflect this approach.
Capacity development in terms of the technical skills necessary to process, analyse
and share data is vital20 and could be addressed through a blend of face-to-face training
and complementary online resources. While practical workshops are an impactful way
to teach technical skills, they are also expensive to run—and the cost may be prohibitive,
particularly in resource-poor settings, limiting attendance to those who can afford
to travel to such workshops. Learning materials—such as articles, recordings of seminars,
handbooks, and online courses—available without an institutional affiliation and without
a fee—would help to bridge the gap and to ensure that everyone who needs to develop
these skills is supported in doing so. Providing clear guidance and a variety of resources
in one easily identifiable place that can be referred to as needed, should go some
way towards addressing the concerns about the time necessary to prepare and share
data. In terms of funding for activities related to data sharing, incorporating data
sharing into data management plans and funding applications should be supported by
providing practical guidance on how to effectively develop such plans.
Funders have an interest in supporting their grantees—and the wider scientific community—with
clearer guidance and a range of tools facilitating the process of data sharing. Encouragingly,
funders are taking steps to improve the situation by financing projects related to
data sharing and, as described above, there are now a few different types of initiatives
supporting data sharing. Researchers are also contributing by collating and publishing
information in order to facilitate the development of guidelines and principles.14
16 21–25 Currently, EDCTP is working in collaboration with the Global Health Network
to create a one ‘go-to’ platform—The Knowledge Hub—that will facilitate all aspects
of data sharing in health research.
The Knowledge Hub (EDCTPKnowledgeHub.tghn.org) will provide free and accessible resources,
guidance and training on how to manage and share data. This includes resources relevant
to all stages of data sharing, from data collection, processing and management, through
preparation of metadata and documentation, to guidance on choosing an appropriate
repository for data deposition (Box 1). The aim of this Hub is to become a beneficial
resource for researchers that can guide and support the process of running a research
project, including data sharing. While the focus of EDCTP is on clinical trials, many,
if not most, of the Hub’s resources should be applicable to working with other health
data, and not just limited to clinical trials data. Ongoing feedback from the research
community will be essential to refine and validate the usefulness of this resource
and to improving data sharing practices of research teams working with health data.
Box 1
Resources for practitioners and researchers
The Knowledge Hub (https://EDCTPKnowledgeHub.TGHN.org) contains a ‘Data Sharing Toolkit’
covering various aspects of the process of sharing research data—from understanding
the different models of access (open/controlled/closed) and options offered by repositories,
through preparing the data (e.g. naming conventions, de-identification of sensitive
data, using non-proprietary data formats) and the documentation (how to write a README
file), to a checklist of files to include with a data submission and an example of
the submission process. The Global Health Network and EDCTP are also developing a
‘Repository Tool’ that will guide researchers through the process of choosing a repository
appropriate for their data set.
Furthermore, the ‘Data Sharing Toolkit’ includes a collection of nearly 200 external
resources (https://edctpknowledgehub.tghn.org/data-sharing-toolkit/collated-external-resources/),
including guides, recordings of seminars and comprehensive e-learning courses. Additionally,
as data processing is crucial to preparing data sets for sharing, ‘The Hub’ also covers
the basics of data management, which will be developed into a rich ‘Data Management
Portal’ in due course.
The objective of ‘The Knowledge Hub’ is to be a one ‘go-to’ platform, by providing
researchers with start-to-finish guidance on all aspects of working with health data
and the eventual data sharing, but also supplying bespoke tools that will make the
process easier. All resources available within ‘The Hub’, as well as anything included
in the collection of external resources, are freely accessible to all users regardless
of institutional affiliation or funder. It will be important that research teams feedback
on the usefulness of this platform.