+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: not found

      Evaluating Common De-Identification Heuristics for Personal Health Information

      , BEng, PhD , 1 , , BSc 2 , , BPAPM 3 , , MSc 4 , , LLB 5

      (Reviewer), (Reviewer)

      Journal of Medical Internet Research

      Gunther Eysenbach

      Privacy, confidentiality, HIPAA, security, data disclosure, ethics

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.



          With the growing adoption of electronic medical records, there are increasing demands for the use of this electronic clinical data in observational research. A frequent ethics board requirement for such secondary use of personal health information in observational research is that the data be de-identified. De-identification heuristics are provided in the Health Insurance Portability and Accountability Act Privacy Rule, funding agency and professional association privacy guidelines, and common practice.


          The aim of the study was to evaluate whether the re-identification risks due to record linkage are sufficiently low when following common de-identification heuristics and whether the risk is stable across sample sizes and data sets.


          Two methods were followed to construct identification data sets. Re-identification attacks were simulated on these. For each data set we varied the sample size down to 30 individuals, and for each sample size evaluated the risk of re-identification for all combinations of quasi-identifiers. The combinations of quasi-identifiers that were low risk more than 50% of the time were considered stable.


          The identification data sets we were able to construct were the list of all physicians and the list of all lawyers registered in Ontario, using 1% sampling fractions. The quasi-identifiers of region, gender, and year of birth were found to be low risk more than 50% of the time across both data sets. The combination of gender and region was also found to be low risk more than 50% of the time. We were not able to create an identification data set for the whole population.


          Existing Canadian federal and provincial privacy laws help explain why it is difficult to create an identification data set for the whole population. That such examples of high re-identification risk exist for mainstream professions makes a strong case for not disclosing the high-risk variables and their combinations identified here. For professional subpopulations with published membership lists, many variables often needed by researchers would have to be excluded or generalized to ensure consistently low re-identification risk. Data custodians and researchers need to consider other statistical disclosure techniques for protecting privacy.

          Related collections

          Most cited references 56

          • Record: found
          • Abstract: not found
          • Article: not found


            • Record: found
            • Abstract: found
            • Article: not found

            Impracticability of informed consent in the Registry of the Canadian Stroke Network.

            Government legislators and research ethics boards in some jurisdictions require all patients to give written informed consent before enrollment in clinical registries. However, the effect of such a requirement on the use of clinical registries and the extent to which registry data can be generalized remain uncertain. We examined the effectiveness of a comprehensive attempt to obtain informed consent between June 2001 and December 2002 on the overall participation rate and the characteristics of participating patients in the Registry of the Canadian Stroke Network, a prospective registry based at 20 major stroke centers across Canada. The overall participation rate (i.e., the consent rate among all potential participants) was 39.3 percent of 4285 eligible patients during phase 1 of the project (June 2001 through February 2002) and 50.6 percent of 2823 eligible patients during phase 2 (June 2002 through December 2002), despite the presence of neurologic research nurse coordinators at each site. Many patients died or left the hospital before they could be approached for consent. Major selection biases were found; the in-hospital mortality rate was much lower among patients who were enrolled (6.9 percent) than among those who were not enrolled (21.7 percent) (relative risk of in-hospital death, 3.13; 95 percent confidence interval, 2.65 to 3.70; P<0.001). We estimate that approximately 500,000 dollars (Canadian dollars) was spent on consent-related issues during the first two years of the registry. Obtaining written informed consent for participation in a stroke registry led to important selection biases, such that registry patients were not representative of the typical patient with stroke at each center. These findings highlight the need for legislation on privacy and policies permitting waivers of informed consent for minimal-risk observational research. Copyright 2004 Massachusetts Medical Society
              • Record: found
              • Abstract: found
              • Article: not found

              Recruiting patients to medical research: double blind randomised trial of "opt-in" versus "opt-out" strategies.

              To evaluate the effect of opt-in compared with opt-out recruitment strategies on response rate and selection bias. Double blind randomised controlled trial. Two general practices in England. 510 patients with angina. Patients were randomly allocated to an opt-in (asked to actively signal willingness to participate in research) or opt-out (contacted repeatedly unless they signalled unwillingness to participate) approach for recruitment to an observational prognostic study of patients with angina. Recruitment rate and clinical characteristics of patients. The recruitment rate, defined by clinic attendance, was 38% (96/252) in the opt-in arm and 50% (128/258) in the opt-out arm (P = 0.014). Once an appointment had been made, non-attendance at the clinic was similar (20% opt-in arm v 17% opt-out arm; P = 0.86). Patients in the opt-in arm had fewer risk factors (44% v 60%; P = 0.053), less treatment for angina (69% v 82%; P = 0.010), and less functional impairment (9% v 20%; P = 0.023) than patients in the opt-out arm. The opt-in approach to participant recruitment, increasingly required by ethics committees, resulted in lower response rates and a biased sample. We propose that the opt-out approach should be the default recruitment strategy for studies with low risk to participants.

                Author and article information

                J Med Internet Res
                Journal of Medical Internet Research
                Gunther Eysenbach (Centre for Global eHealth Innovation, Toronto, Canada )
                Oct-Dec 2006
                21 November 2006
                : 8
                : 4
                5Gowling Lafleur Henderson LLPOttawaONCanada
                4Departement d’Informatique et de StatistiqueFaculte de Sciences Economiques et de GestionUniversite Lumiere Lyon 2LyonFrance
                3simpleDepartment of Geography and Environment simpleLondon School of Economics and Political Science LondonUK
                2simpleSchool of Computer Science simpleCarleton University OttawaONCanada
                1simpleUniversity of Ottawa and CHEO Research Institute OttawaONCanada
                © Khaled El Emam, Sam Jabbouri, Scott Sams, Youenn Drouet, Michael Power. Originally published in the Journal of Medical Internet Research (, 21.11.2006. Except where otherwise noted, articles published in the Journal of Medical Internet Research are distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited, including full bibliographic details and the URL (see "please cite as" above), and this statement is included.
                Original Paper


                ethics, privacy, confidentiality, hipaa, security, data disclosure


                Comment on this article