28
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Efficient differentially private learning improves drug sensitivity prediction

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Users of a personalised recommendation system face a dilemma: recommendations can be improved by learning from data, but only if other users are willing to share their private information. Good personalised predictions are vitally important in precision medicine, but genomic information on which the predictions are based is also particularly sensitive, as it directly identifies the patients and hence cannot easily be anonymised. Differential privacy has emerged as a potentially promising solution: privacy is considered sufficient if presence of individual patients cannot be distinguished. However, differentially private learning with current methods does not improve predictions with feasible data sizes and dimensionalities.

          Results

          We show that useful predictors can be learned under powerful differential privacy guarantees, and even from moderately-sized data sets, by demonstrating significant improvements in the accuracy of private drug sensitivity prediction with a new robust private regression method. Our method matches the predictive accuracy of the state-of-the-art non-private lasso regression using only 4x more samples under relatively strong differential privacy guarantees. Good performance with limited data is achieved by limiting the sharing of private information by decreasing the dimensionality and by projecting outliers to fit tighter bounds, therefore needing to add less noise for equal privacy.

          Conclusions

          The proposed differentially private regression method combines theoretical appeal and asymptotic efficiency with good prediction accuracy even with moderate-sized data. As already the simple-to-implement method shows promise on the challenging genomic data, we anticipate rapid progress towards practical applications in many fields.

          Reviewers

          This article was reviewed by Zoltan Gaspari and David Kreil.

          Electronic supplementary material

          The online version of this article (10.1186/s13062-017-0203-4) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references15

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells

          Alterations in cancer genomes strongly influence clinical responses to treatment and in many instances are potent biomarkers for response to drugs. The Genomics of Drug Sensitivity in Cancer (GDSC) database (www.cancerRxgene.org) is the largest public resource for information on drug sensitivity in cancer cells and molecular markers of drug response. Data are freely available without restriction. GDSC currently contains drug sensitivity data for almost 75 000 experiments, describing response to 138 anticancer drugs across almost 700 cancer cell lines. To identify molecular markers of drug response, cell line drug sensitivity data are integrated with large genomic datasets obtained from the Catalogue of Somatic Mutations in Cancer database, including information on somatic mutations in cancer genes, gene amplification and deletion, tissue type and transcriptional data. Analysis of GDSC data is through a web portal focused on identifying molecular biomarkers of drug sensitivity based on queries of specific anticancer drugs or cancer genes. Graphical representations of the data are used throughout with links to related resources and all datasets are fully downloadable. GDSC provides a unique resource incorporating large drug sensitivity and genomic datasets to facilitate the discovery of new therapeutic biomarkers for cancer therapies.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Probabilistic programming in Python using PyMC3

            Probabilistic programming allows for automatic Bayesian inference on user-defined probabilistic models. Recent advances in Markov chain Monte Carlo (MCMC) sampling allow inference on increasingly complex models. This class of MCMC, known as Hamiltonian Monte Carlo, requires gradient information which is often not readily available. PyMC3 is a new open source probabilistic programming framework written in Python that uses Theano to compute gradients via automatic differentiation as well as compile probabilistic programs on-the-fly to C for increased speed. Contrary to other probabilistic programming languages, PyMC3 allows model specification directly in Python code. The lack of a domain specific language allows for great flexibility and direct interaction with the model. This paper is a tutorial-style introduction to this software package.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              The Algorithmic Foundations of Differential Privacy

                Bookmark

                Author and article information

                Contributors
                antti.honkela@helsinki.fi
                nmrinl@gmail.com
                arttu.nieminen@helsinki.fi
                onur.dikmen@helsinki.fi
                samuel.kaski@aalto.fi
                Journal
                Biol Direct
                Biol. Direct
                Biology Direct
                BioMed Central (London )
                1745-6150
                6 February 2018
                6 February 2018
                2018
                : 13
                : 1
                Affiliations
                [1 ]ISNI 0000 0004 0410 2071, GRID grid.7737.4, Helsinki Institute for Information Technology HIIT, Department of Computer Science, , University of Helsinki, ; Helsinki, Finland
                [2 ]ISNI 0000 0004 0410 2071, GRID grid.7737.4, Department of Mathematics and Statistics, , University of Helsinki, ; Helsinki, Finland
                [3 ]ISNI 0000 0004 0410 2071, GRID grid.7737.4, Department of Public Health, , University of Helsinki, ; Helsinki, Finland
                [4 ]ISNI 0000000108389418, GRID grid.5373.2, Helsinki Institute for Information Technology HIIT, Department of Computer Science, , Aalto University, ; Helsinki, Finland
                Author information
                http://orcid.org/0000-0003-1925-9154
                Article
                203
                10.1186/s13062-017-0203-4
                5801888
                29409513
                8a054c5e-18c9-4ba4-b5e7-8de08806984b
                © The Author(s) 2018

                Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 6 July 2017
                : 21 December 2017
                Categories
                Research
                Custom metadata
                © The Author(s) 2018

                Life sciences
                differential privacy,linear regression,drug sensitivity prediction,machine learning

                Comments

                Comment on this article