17
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Logistic regression model training based on the approximate homomorphic encryption

      research-article
      1 , 2 , , 3 , 1 , 1
      BMC Medical Genomics
      BioMed Central
      iDASH Privacy and Security Workshop 2017
      14 October 2017
      Homomorphic encryption, Machine learning, Logistic regression

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Security concerns have been raised since big data became a prominent tool in data analysis. For instance, many machine learning algorithms aim to generate prediction models using training data which contain sensitive information about individuals. Cryptography community is considering secure computation as a solution for privacy protection. In particular, practical requirements have triggered research on the efficiency of cryptographic primitives.

          Methods

          This paper presents a method to train a logistic regression model without information leakage. We apply the homomorphic encryption scheme of Cheon et al. (ASIACRYPT 2017) for an efficient arithmetic over real numbers, and devise a new encoding method to reduce storage of encrypted database. In addition, we adapt Nesterov’s accelerated gradient method to reduce the number of iterations as well as the computational cost while maintaining the quality of an output classifier.

          Results

          Our method shows a state-of-the-art performance of homomorphic encryption system in a real-world application. The submission based on this work was selected as the best solution of Track 3 at iDASH privacy and security competition 2017. For example, it took about six minutes to obtain a logistic regression model given the dataset consisting of 1579 samples, each of which has 18 features with a binary outcome variable.

          Conclusions

          We present a practical solution for outsourcing analysis tools such as logistic regression analysis while preserving the data confidentiality.

          Electronic supplementary material

          The online version of this article (10.1186/s12920-018-0401-7) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references9

          • Record: found
          • Abstract: not found
          • Article: not found

          Some Studies in Machine Learning Using the Game of Checkers

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Death risk in hemodialysis patients: the predictive value of commonly measured variables and an evaluation of death rate differences between facilities.

            Logistic regression analysis was applied to a sample of more than 12,000 hemodialysis patients to evaluate the association of various patient descriptors, treatment time (hours/treatment), and various laboratory tests with the probability of death. Advancing age, white race, and diabetes were all associated with a significantly increased risk of death. Short dialysis times were also associated with high death risk before adjustment for the value of laboratory tests. Of the laboratory variables, low serum albumin less than 40 g/L (less than 4.0 g/dL) was most highly associated with death probability. About two thirds of patients had low albumin. These findings suggest that inadequate nutrition may be an important contributing factor to the mortality suffered by hemodialysis patients. The relative risk profiles for other laboratory tests are presented. Among these, low serum creatinine, not high, was associated with high death risk. Both serum albumin concentration and creatinine were directly correlated with treatment time so that high values for both substances were associated with long treatment times. The data suggest that physicians may select patients with high creatinine for more intense dialysis exposure and patients with low creatinine for less intense treatment. In a separate analysis, observed death rates were compared with rates expected on the basis of case mix for these 237 facilities. The data suggest substantial volatility of observed/expected ratios when facility size is small. Nonetheless, a minority of facilities (less than or equal to 2%) may have higher rates than expected when compared with the pool of all patients in this sample. The effect of various laboratory variables on mortality is substantial, while relatively few facilities have observed death rates that exceed their expected values. Therefore, we suggest that strategies designed to improve the overall mortality statistic for dialysis patients in the United States would be better directed toward improving the quality of care for all patients, particularly high-risk patients, within their usual treatment settings rather than trying to identify facilities with high death rate for possible regulatory intervention.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Introduction to genetic association studies.

              Genetic association studies are used to find candidate genes or genome regions that contribute to a specific disease by testing for a correlation between disease status and genetic variation. This article provides a broad outline of the design and analysis of such studies, focusing on case-control studies in candidate genes or regions.
                Bookmark

                Author and article information

                Contributors
                yongsoosong@ucsd.edu
                mrkim@ucsd.edu
                activecondor@snu.ac.kr
                jhcheon@snu.ac.kr
                Conference
                BMC Med Genomics
                BMC Med Genomics
                BMC Medical Genomics
                BioMed Central (London )
                1755-8794
                11 October 2018
                11 October 2018
                2018
                : 11
                Issue : Suppl 4 Issue sponsor : Publication of this supplement has not been supported by sponsorship. Information about the source of funding for publication charges can be found in the individual articles. The articles have undergone the journal's standard peer review process for supplements. The Supplement Editors declare that they have no competing interests.
                : 83
                Affiliations
                [1 ]ISNI 0000 0004 0470 5905, GRID grid.31501.36, Department of Mathematical Sciences, Seoul National University, ; 1 Gwanak-ro, Gwanak-gu, Seoul, 08826 Republic of Korea
                [2 ]ISNI 0000 0001 2107 4242, GRID grid.266100.3, Department of Computer Science and Engineering, University of California- San Diego, ; 9500 Gillman Drive, San Diego, 92093-0404 California USA
                [3 ]ISNI 0000 0001 2107 4242, GRID grid.266100.3, Division of Biomedical Informatics, University of California- San Diego, ; 9500 Gillman Drive, San Diego, 92093-0728 California USA
                Article
                401
                10.1186/s12920-018-0401-7
                6180367
                30309349
                c8d21325-9772-45e9-8a3b-27b1a5440730
                © The Author(s) 2018

                Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                iDASH Privacy and Security Workshop 2017
                Orlando, FL, USA
                14 October 2017
                History
                Categories
                Research
                Custom metadata
                © The Author(s) 2018

                Genetics
                homomorphic encryption,machine learning,logistic regression
                Genetics
                homomorphic encryption, machine learning, logistic regression

                Comments

                Comment on this article