Counterfactuals of Counterfactuals: a back-translation-inspired approach to analyse counterfactual editors

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

In the wake of responsible AI, interpretability methods, which attempt to provide an explanation for the predictions of neural models have seen rapid progress. In this work, we are concerned with explanations that are applicable to natural language processing (NLP) models and tasks, and we focus specifically on the analysis of counterfactual, contrastive explanations. We note that while there have been several explainers proposed to produce counterfactual explanations, their behaviour can vary significantly and the lack of a universal ground truth for the counterfactual edits imposes an insuperable barrier on their evaluation. We propose a new back translation-inspired evaluation methodology that utilises earlier outputs of the explainer as ground truth proxies to investigate the consistency of explainers. We show that by iteratively feeding the counterfactual to the explainer we can obtain valuable insights into the behaviour of both the predictor and the explainer models, and infer patterns that would be otherwise obscured. Using this methodology, we conduct a thorough analysis and propose a novel metric to evaluate the consistency of counterfactual generation approaches with different characteristics across available performance indicators.

Related collections

Author and article information

Journal

Publication date Created: 26 May 2023

Article

ArXiV ID: 2305.17055

SO-VID: 7d400637-1de7-45e2-afe8-d9da78f8cb58

License:

http://creativecommons.org/licenses/by-nc-nd/4.0/

History

Custom metadata

Comments Accepted at Findings of ACL 2023

Categories cs.CL

ScienceOpen disciplines: Theoretical computer science

Data availability:

ScienceOpen disciplines: Theoretical computer science

Counterfactuals of Counterfactuals: a back-translation-inspired approach to analyse counterfactual editors

Read this article at

Abstract

Related collections

Blockchain in Healthcare Today

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 309