45
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Estimating the success of re-identifications in incomplete datasets using generative models

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          While rich medical, behavioral, and socio-demographic data are key to modern data-driven research, their collection and use raise legitimate privacy concerns. Anonymizing datasets through de-identification and sampling before sharing them has been the main tool used to address those concerns. We here propose a generative copula-based method that can accurately estimate the likelihood of a specific person to be correctly re-identified, even in a heavily incomplete dataset. On 210 populations, our method obtains AUC scores for predicting individual uniqueness ranging from 0.84 to 0.97, with low false-discovery rate. Using our model, we find that 99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes. Our results suggest that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model.

          Abstract

          Anonymization has been the main means of addressing privacy concerns in sharing medical and socio-demographic data. Here, the authors estimate the likelihood that a specific person can be re-identified in heavily incomplete datasets, casting doubt on the adequacy of current anonymization practices.

          Related collections

          Most cited references44

          • Record: found
          • Abstract: not found
          • Article: not found

          k-ANONYMITY: A MODEL FOR PROTECTING PRIVACY

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            The inevitable application of big data to health care.

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              The sampling theory of selectively neutral alleles.

              W.J. Ewens (1972)
                Bookmark

                Author and article information

                Contributors
                deMontjoye@imperial.ac.uk
                Journal
                Nat Commun
                Nat Commun
                Nature Communications
                Nature Publishing Group UK (London )
                2041-1723
                23 July 2019
                23 July 2019
                2019
                : 10
                : 3069
                Affiliations
                [1 ]ISNI 0000 0001 2294 713X, GRID grid.7942.8, Information and Communication Technologies, Electronics and Applied Mathematics (ICTEAM), , Université catholique de Louvain, ; B-1348 Louvain-la-Neuve, Belgium
                [2 ]ISNI 0000 0001 2113 8111, GRID grid.7445.2, Department of Computing, , Imperial College London, ; London, SW7 2AZ UK
                [3 ]ISNI 0000 0001 2113 8111, GRID grid.7445.2, Data Science Institute, , Imperial College London, ; London, SW7 2AZ UK
                Author information
                http://orcid.org/0000-0002-9956-1187
                Article
                10933
                10.1038/s41467-019-10933-3
                6650473
                31337762
                d5b6369d-a992-487d-a774-eba7cbc79a5c
                © The Author(s) 2019

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 27 September 2018
                : 11 June 2019
                Funding
                Funded by: FundRef https://doi.org/10.13039/501100002661, Fonds De La Recherche Scientifique - FNRS (Belgian National Fund for Scientific Research);
                Funded by: FundRef https://doi.org/10.13039/501100010978, Wallonie-Bruxelles International (WBI);
                Funded by: FundRef https://doi.org/10.13039/501100000761, Imperial College London;
                Categories
                Article
                Custom metadata
                © The Author(s) 2019

                Uncategorized
                computational science,social sciences
                Uncategorized
                computational science, social sciences

                Comments

                Comment on this article