There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.
Abstract
<p class="first" id="d9366102e87">The use of machine learning to guide clinical decision
making has the potential to
worsen existing health disparities. Several recent works frame the problem as that
of algorithmic fairness, a framework that has attracted considerable attention and
criticism. However, the appropriateness of this framework is unclear due to both ethical
as well as technical considerations, the latter of which include trade-offs between
measures of fairness and model performance that are not well-understood for predictive
models of clinical outcomes. To inform the ongoing debate, we conduct an empirical
study to characterize the impact of penalizing group fairness violations on an array
of measures of model performance and group fairness. We repeat the analysis across
multiple observational healthcare databases, clinical outcomes, and sensitive attributes.
We find that procedures that penalize differences between the distributions of predictions
across groups induce nearly-universal degradation of multiple performance metrics
within groups. On examining the secondary impact of these procedures, we observe heterogeneity
of the effect of these procedures on measures of fairness in calibration and ranking
across experimental conditions. Beyond the reported trade-offs, we emphasize that
analyses of algorithmic fairness in healthcare lack the contextual grounding and causal
awareness necessary to reason about the mechanisms that lead to health disparities,
as well as about the potential of algorithmic fairness methods to counteract those
mechanisms. In light of these limitations, we encourage researchers building predictive
models for clinical use to step outside the algorithmic fairness frame and engage
critically with the broader sociotechnical context surrounding the use of machine
learning in healthcare.
</p>