32
views
0
recommends
+1 Recommend
0 collections
    12
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A method for managing re-identification risk from small geographic areas in Canada

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          A common disclosure control practice for health datasets is to identify small geographic areas and either suppress records from these small areas or aggregate them into larger ones. A recent study provided a method for deciding when an area is too small based on the uniqueness criterion. The uniqueness criterion stipulates that an the area is no longer too small when the proportion of unique individuals on the relevant variables (the quasi-identifiers) approaches zero. However, using a uniqueness value of zero is quite a stringent threshold, and is only suitable when the risks from data disclosure are quite high. Other uniqueness thresholds that have been proposed for health data are 5% and 20%.

          Methods

          We estimated uniqueness for urban Forward Sortation Areas (FSAs) by using the 2001 long form Canadian census data representing 20% of the population. We then constructed two logistic regression models to predict when the uniqueness is greater than the 5% and 20% thresholds, and validated their predictive accuracy using 10-fold cross-validation. Predictor variables included the population size of the FSA and the maximum number of possible values on the quasi-identifiers (the number of equivalence classes).

          Results

          All model parameters were significant and the models had very high prediction accuracy, with specificity above 0.9, and sensitivity at 0.87 and 0.74 for the 5% and 20% threshold models respectively. The application of the models was illustrated with an analysis of the Ontario newborn registry and an emergency department dataset. At the higher thresholds considerably fewer records compared to the 0% threshold would be considered to be in small areas and therefore undergo disclosure control actions. We have also included concrete guidance for data custodians in deciding which one of the three uniqueness thresholds to use (0%, 5%, 20%), depending on the mitigating controls that the data recipients have in place, the potential invasion of privacy if the data is disclosed, and the motives and capacity of the data recipient to re-identify the data.

          Conclusion

          The models we developed can be used to manage the re-identification risk from small geographic areas. Being able to choose among three possible thresholds, a data custodian can adjust the definition of "small geographic area" to the nature of the data and recipient.

          Related collections

          Most cited references22

          • Record: found
          • Abstract: found
          • Article: not found

          GIS and health care.

          GIS and related spatial analysis methods provide a set of tools for describing and understanding the changing spatial organization of health care, for examining its relationship to health outcomes and access, and for exploring how the delivery of health care can be improved. This review discusses recent literature on GIS and health care. It considers the use of GIS in analyzing health care need, access, and utilization; in planning and evaluating service locations; and in spatial decision support for health care delivery. The adoption of GIS by health care researchers and policy-makers will depend on access to integrated spatial data on health services utilization and outcomes and data that cut across human service systems. We also need to understand better the spatial behaviors of health care providers and consumers in the rapidly changing health care landscape and how geographic information affects these dynamic relationships.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Probabilistic prediction in patient management and clinical trials.

            It is argued that the provision of accurate and useful probabilistic assessments of future events should be a fundamental task for biostatisticians collaborating in clinical or experimental medicine, and we explore two aspects of obtaining and evaluating such predictions. When covariate information on patients is available, logistic regression and other multivariate techniques are often used to select prognostic factors and create predictive models. An example shows how the explicit aim of prediction needs to be taken into account in such modelling, and how predictive performance may be assessed by decomposition of a scoring rule. Secondly, results from a program that provides pretrial and interim predictions in clinical trials are displayed, bringing together the use of subjective opinion, Bayesian methodology and techniques for evaluating and criticizing predictions.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              GIS and disease.

              Geographic information systems (GIS) and related technologies like remote sensing are increasingly used to analyze the geography of disease, specifically the relationships between pathological factors (causative agents, vectors and hosts, people) and their geographical environments. GIS applications in the United States have described the sources and geographical distributions of disease agents, identified regions in time and space where people may be exposed to environmental and biological agents, and mapped and analyzed spatial and temporal patterns in health outcomes. Although GIS show great promise in the study of disease, their full potential will not be realized until environmental and disease surveillance systems are developed that distribute data on the geography of environmental conditions, disease agents, and health outcomes over time based on user-defined queries for user-selected geographical areas.
                Bookmark

                Author and article information

                Journal
                BMC Med Inform Decis Mak
                BMC Medical Informatics and Decision Making
                BioMed Central
                1472-6947
                2010
                2 April 2010
                : 10
                : 18
                Affiliations
                [1 ]Children's Hospital of Eastern Ontario Research Institute, 401 Smyth Road, Ottawa, Ontario K1J 8L1, Canada
                [2 ]Pediatrics, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada
                [3 ]GIS Infrastructure, Office of Public Health Practice, Public Health Agency of Canada, Ottawa, Ontario K1A 0K9, Canada
                [4 ]Ottawa Hospital Research Institute, Ottawa, Ontario, Canada
                [5 ]Children's Hospital of Eastern Ontario, 401 Smyth Road, Ottawa, Ontario K1J 8L1, Canada
                Article
                1472-6947-10-18
                10.1186/1472-6947-10-18
                2858714
                20361870
                060d8825-dec1-4fa9-b097-1bc600e45d21
                Copyright ©2010 El Emam et al; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 28 May 2009
                : 2 April 2010
                Categories
                Research Article

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article