21
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Challenges in administrative data linkage for research

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Linkage of population-based administrative data is a valuable tool for combining detailed individual-level information from different sources for research. While not a substitute for classical studies based on primary data collection, analyses of linked administrative data can answer questions that require large sample sizes or detailed data on hard-to-reach populations, and generate evidence with a high level of external validity and applicability for policy making. There are unique challenges in the appropriate research use of linked administrative data, for example with respect to bias from linkage errors where records cannot be linked or are linked together incorrectly. For confidentiality and other reasons, the separation of data linkage processes and analysis of linked data is generally regarded as best practice. However, the ‘black box’ of data linkage can make it difficult for researchers to judge the reliability of the resulting linked data for their required purposes. This article aims to provide an overview of challenges in linking administrative data for research. We aim to increase understanding of the implications of (i) the data linkage environment and privacy preservation; (ii) the linkage process itself (including data preparation, and deterministic and probabilistic linkage methods) and (iii) linkage quality and potential bias in linked data. We draw on examples from a number of countries to illustrate a range of approaches for data linkage in different contexts.

          Related collections

          Most cited references38

          • Record: found
          • Abstract: found
          • Article: not found

          Administrative record linkage as a tool for public health research.

          Linked administrative databases offer a powerful resource for studying important public health issues. Methods developed and implemented in several jurisdictions across the globe have achieved high-quality linkages for conducting health and social research without compromising confidentiality. Key data available for linkage include health services utilization, population registries, place of residence, family ties, educational outcomes, and use of social services. Linking events for large populations of individuals across disparate sources and over time permits a range of research possibilities, including the capacity to study low-prevalence exposure-disease associations, multiple outcome domains within the same cohort of individuals, service utilization and chronic disease patterns, and life course and transgenerational transmission of health. Limited information on variables such as individual-level socioeconomic status (SES) and social supports is outweighed by strengths that include comprehensive follow-up, continuous data collection, objective measures, and relatively low expense. Ever advancing methodologies and data holdings guarantee that research using linked administrative databases will make increasingly important contributions to public health research.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A decade of data linkage in Western Australia: strategic design, applications and benefits of the WA data linkage system.

            The report describes the strategic design, steps to full implementation and outcomes achieved by the Western Australian Data Linkage System (WADLS), instigated in 1995 to link up to 40 years of data from over 30 collections for an historical population of 3.7 million. Staged development has seen its expansion, initially from a linkage key to local health data sets, to encompass links to national and local health and welfare data sets, genealogical links and spatial references for mapping applications. The WADLS has supported over 400 studies with over 250 journal publications and 35 graduate research degrees. Applications have occurred in health services utilisation and outcomes, aetiologic research, disease surveillance and needs analysis, and in methodologic research. Longitudinal studies have become cheaper and more complete; deletion of duplicate records and correction of data artifacts have enhanced the quality of information assets; data linkage has conserved patient privacy; community machinery necessary for organised responses to health and social problems has been exercised; and the commercial return on research infrastructure investment has exceeded 1000%. Most importantly, there have been unbiased contributions to medical knowledge and identifiable advances in population health arising from the research.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Describing the linkages of the immigration, refugees and citizenship Canada permanent resident data and vital statistics death registry to Ontario’s administrative health database

              Background Ontario, the most populous province in Canada, has a universal healthcare system that routinely collects health administrative data on its 13 million legal residents that is used for health research. Record linkage has become a vital tool for this research by enriching this data with the Immigration, Refugees and Citizenship Canada Permanent Resident (IRCC-PR) database and the Office of the Registrar General’s Vital Statistics-Death (ORG-VSD) registry. Our objectives were to estimate linkage rates and compare characteristics of individuals in the linked versus unlinked files. Methods We used both deterministic and probabilistic linkage methods to link the IRCC-PR database (1985–2012) and ORG-VSD registry (1990–2012) to the Ontario’s Registered Persons Database. Linkage rates were estimated and standardized differences were used to assess differences in socio-demographic and other characteristics between the linked and unlinked records. Results The overall linkage rates for the IRCC-PR database and ORG-VSD registry were 86.4 and 96.2 %, respectively. The majority (68.2 %) of the record linkages in IRCC-PR were achieved after three deterministic passes, 18.2 % were linked probabilistically, and 13.6 % were unlinked. Similarly the majority (79.8 %) of the record linkages in the ORG-VSD were linked using deterministic record linkage, 16.3 % were linked after probabilistic and manual review, and 3.9 % were unlinked. Unlinked and linked files were similar for most characteristics, such as age and marital status for IRCC-PR and sex and most causes of death for ORG-VSD. However, lower linkage rates were observed among people born in East Asia (78 %) in the IRCC-PR database and certain causes of death in the ORG-VSD registry, namely perinatal conditions (61.3 %) and congenital anomalies (81.3 %). Conclusions The linkages of immigration and vital statistics data to existing population-based healthcare data in Ontario, Canada will enable many novel cross-sectional and longitudinal studies to be conducted. Analytic techniques to account for sub-optimal linkage rates may be required in studies of certain ethnic groups or certain causes of death among children and infants. Electronic supplementary material The online version of this article (doi:10.1186/s12911-016-0375-3) contains supplementary material, which is available to authorized users.
                Bookmark

                Author and article information

                Journal
                101648833
                43393
                Big Data Soc
                Big Data Soc
                Big data & society
                2053-9517
                24 October 2018
                5 December 2017
                29 October 2018
                : 4
                : 2
                : 2053951717745678
                Affiliations
                [1 ]Department of Health Services Research and Policy, London School of Hygiene & Tropical Medicine, London, UK
                [2 ]Institute of Geography and the Lived Environment, University of Edinburgh, Edinburgh, UK
                [3 ]Centre for Population Health Research, Curtin University, Perth, Australia
                [4 ]CHESS Karolinska Institutet, Stockholm University, Stockholm, Sweden
                [5 ]Institute for Clinical Evaluative Sciences, Toronto, Canada
                [6 ]Center of Data and Knowledge Integration for Health (CIDACS), Instituto Gonçalo Moniz, Fundacção Oswaldo Cruz, CEP 41745-715 Salvador-Bahia, Brazil
                [7 ]Graduate School of Education, University of Bristol, Bristol, UK, and UCL Great Ormond Street Institute of Child Health, London, UK
                Author notes
                Corresponding author: Katie Harron, 15-17 Tavistock Place, London WC1 H 9SH, UK. Katie.harron@ 123456lshtm.ac.uk
                Article
                EMS80105
                10.1177/2053951717745678
                6187070
                30381794
                58a47ace-31fb-4ea2-96b6-fe68943512a1

                Creative Commons CC-BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License ( http://www.creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages ( https://us.sagepub.com/en-us/nam/open-access-at-sage).

                History
                Categories
                Article

                data linkage,record linkage,epidemiological studies,measurement error,selection bias,data accuracy administrative data

                Comments

                Comment on this article