1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Regularizing Black-box Models for Improved Interpretability

      Preprint
      , , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Most work on interpretability in machine learning has focused on designing either inherently interpretable models, that typically trade-off interpretability for accuracy, or post-hoc explanation systems, that lack guarantees about their explanation quality. We propose an alternative to these approaches by directly regularizing a black-box model for interpretability at training time. Our approach explicitly connects three key aspects of interpretable machine learning: the model's innate explainability, the explanation system used at test time, and the metrics that measure explanation quality. Our regularization results in substantial (up to orders of magnitude) improvement in terms of explanation fidelity and stability metrics across a range of datasets, models, and black-box explanation systems. Remarkably, our regularizers also slightly improve predictive accuracy on average across the nine datasets we consider. Further, we show that the benefits of our novel regularizers on explanation quality provably generalize to unseen test points.

          Related collections

          Most cited references3

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          Intelligible Models for HealthCare

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Jet-images — deep learning edition

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              Improving the Robustness of Deep Neural Networks via Stability Training

                Bookmark

                Author and article information

                Journal
                18 February 2019
                Article
                1902.06787
                afbd5cca-cf62-462a-be4e-9dc45c3a21d3

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                cs.LG stat.ML

                Machine learning,Artificial intelligence
                Machine learning, Artificial intelligence

                Comments

                Comment on this article