End-of-life (EOL) decision making in the intensive care unit (ICU) is challenging
for both families and clinicians. This decision-making process is ideally framed around
a shared understanding of a patient’s values and goals, all taken in the context of
their critical illness and prognosis. However, clinicians commonly face uncertainty
regarding prognosis and may have difficulty offering families an accurate assessment
of the likely outcomes of treatment decisions. Adding to the complexity of these scenarios,
clinicians, patients and families are each susceptible to unconscious but influential
cognitive biases when making decisions under stress. Given these challenges, and a
rapidly growing interest in data science to inform care in the ICU, investigators
have explored the use of prediction models (eg, machine learning or ML algorithms)
to assist with prognostication.1–3 Prediction models describe an outcome distribution
among individuals with a particular set of characteristics, such as risk of acute
kidney injury among individuals with particular laboratory values and clinical characteristics
in a population. However, they do not compare how that outcome distribution would
change were different treatment decisions made in that population—this requires causal
effect estimation, rather than prediction modelling. Herein, we explain why prediction
modelling alone is not sufficient to inform many ICU treatment decisions, including
EOL decision making, and describe why causal effect estimation is necessary.
Consider the following case in which a prediction model is used, rather than causal
effect estimation: a 68-year-old man is admitted to the ICU with severe pneumonia
and requires mechanical ventilation. After 5 days, he requires continued full support
from the ventilator and has developed delirium. His family is concerned about prolonging
intensive care but worries about transitioning to comfort measures prematurely if
continued intensive care could, in fact, achieve their goals. His family and clinicians
would like to use the best available evidence to inform their decision. Given an increasing
interest in mortality prediction models, the clinical team explores this as a tool
for decision support. Their chosen algorithm is a prediction model—trained on available
data containing measurements of treatments, outcomes and other characteristics of
previously ventilated patients–which returns a 70% probability of death within the
next 30 days.
The first question we must ask is: what is the precise interpretation of this prediction?
This is an estimate of the probability of death for a population of patients ‘like
this patient’—patients mechanically ventilated for 5 days with similar baseline characteristics
and clinical risk factors up to that moment in their ICU course. Importantly, this
probability is contingent on the treatment decisions, after day 5, that were made
in the population from which the training data came. For example, if all patients
similar to this one in the training data transitioned after day 5 to comfort measures
only, then the algorithm would predict a 100% risk of 30-day mortality. Alternatively,
if most of the patients similar to this index case in the training data frequently
pursued tracheostomy, the model may predict a low 30-day mortality. These two extreme
conditions demonstrate that the interpretation is highly dependent on the distribution
of treatment decisions among patients ‘like this patient’ in the training data.
Second, and more importantly, we consider the following question: how is this probability
useful (or not useful) to both the family and the clinicians? We must first clearly
articulate the question that clinicians and family members are truly interested in
answering for this patient. They are not simply concerned about predicting if survival
is possible. Instead, they are interested in knowing: what would the outcome be if
one treatment strategy was chosen compared with the outcome if a different treatment
strategy was chosen. To be more concrete, they may want to know if continued mechanical
ventilation and attempts at ventilator liberation for another week would result in
a different 30-day survival than a more limited trial of 48 hours of ventilation.
They are concerned with the balance between unnecessarily prolonging intensive care
versus a missed opportunity for survival if they transition to comfort measures prematurely.
More importantly, in the context of the patient’s values and goals, they likely want
to understand the effect of these treatment strategies on long-term quality-of-life.
Contrasting causal effects with prediction models
We refer to the outcome that would have been observed had, perhaps contrary to fact,
a particular treatment been given, as the counterfactual outcome under that treatment.
Using our earlier clinical scenario as an example, if our critically ill patient had
decided to undergo tracheostomy on day 7, we might wonder ‘what would have happened
if he did not undergo tracheostomy and instead remained intubated’. The outcome under
our ‘what if’ scenario is contrary to what actually happened, that is, the counterfactual
outcome. Causal effects compare counterfactual outcomes for a person (or a population)
under different treatment strategies, asking ‘what would be the outcome if we choose
treatment A compared to the outcome if we choose treatment B’. Clinicians, patients
and families intuitively think in counterfactuals when weighing the risks and benefits
of different decisions, including EOL decision making. Thus, causal estimates would
seem to be the natural approach to support decision making. Yet prediction models,
rather than causal estimates, have received rapidly growing attention in the literature
while their limitations are often overlooked.
In contrast to causal estimates, mortality prediction models are mapping inputs (or
‘features’) to a chosen outcome, such as mortality. They might help estimate if a
patient is at a higher risk of death, but they offer little help in making the best
decision in that scenario. However, as we noted previously, these estimates depend
on the distribution of treatments that were given to patients like ours in the training
data. As such, if historic treatment distributions differ from those in current practice,
then the prediction will be inaccurate.
The appeal of predictive approaches, from a data science perspective, is that they
can be readily applied with existing healthcare data and established machine learning
algorithms. However, these models ignore assumptions about causal structure— the relationship
between the variables that can only be informed by expert knowledge. For example,
users of a prediction model may claim that a high fraction of inspired oxygen is associated
with higher ICU mortality but that prediction could not justify a claim that an intervention
to reduce the fraction of inspired oxygen administered would reduce mortality without
first reasoning about how these variables are connected to one another.4 Specifically,
they must defend assumptions about how the treatment and outcome are related, causally
or by associational pathways. Because prediction models omit this step, they cannot
provide an estimate of expected outcomes when divergent treatment decisions are chosen.
Applying causal inference to ICU data
Having established the need for causal effects of different ICU decisions, rather
than predictions of mortality, we will describe how they can be estimated. An intuitive
and effective approach to designing observational analyses for estimation of causal
effects is to specify a hypothetical pragmatic randomised trial (ie, a ‘target trial’),
one that would answer the question of interest but may be impossible or impractical
to conduct in practice.5 This hypothetical trial helps us be explicit about the important
aspects of our analysis, including the causal question it aims to answer and avoid
biases introduced by the study design (eg, immortal time bias).6 Specifically, we
need to define the eligibility criteria, the treatment strategies of interest, the
follow-up period (including a clear definition of ‘time zero’, the start of follow-up,
eg, mechanical ventilation day 5 in the above example), the outcomes of interest and
the statistical analysis plan. This also requires expert knowledge of the clinical
context.
We describe one such trial, for example, in table 1. This trial is ethically and logistically
infeasible; therefore, an analysis of observational data, designed with identical
features as the trial, is the next best approach. In particular, to emulate the trial
described in table 1, we would, after obtaining appropriate observational data: (1)
restrict our data to individuals meeting the eligibility criteria, (2) classify those
who immediately discontinue mechanical ventilation as adherent to strategy one and
those continuing mechanical ventilation on day 6 as adherent to strategy two and (3)
compare estimates of the risk of 30-day mortality among those adherent to strategy
one versus strategy two, adjusted for measured prebaseline prognostic factors (ie,
measured confounders). Adjustment is required because treatment is not randomly assigned
in observational data (in other words, treatment is related to the outcome via associational
pathways). If all the relevant confounders are measured and adjusted for, then the
same effect estimates will be obtained from the observational data analysis as from
the trial had it been conducted (except for random variation).
Table 1
A hypothetical randomised trial
Trial component
Description
Eligibility criteria
Individuals aged 65 years or older admitted to a critical care unit with severe pneumonia
requiring intubation who have received 5 days of mechanical ventilation and cannot
yet be liberated from the ventilator.
Treatment strategies
Immediate transition to comfort care measures.
Immediate continuation of mechanical ventilation.*
Assignment procedures
Unblinded random assignment to one of the treatment strategies.
Follow-up period
Beginning at baseline, the time of randomisation, individuals are followed until death
or the end of 30 days.†
Outcome
All-cause mortality by the end of 30-day follow-up.
Causal contrast of interest
Intention to treat effect.
*Note that these treatment strategies are not sustained because they only direct initial
treatment. In other words, individuals assigned to ‘immediate continuation of mechanical
ventilation’ may later be transitioned to comfort measures.
†For simplicity, we assume no loss to follow-up. This assumption is reasonable for
ventilated patients for an endpoint at 30 days from randomisation.
While this may be a useful example for the technical aspects of this process, the
trial described in table 1 is not the one of interest to decision makers. For example,
treatment strategies such as ‘continue mechanical ventilation for another week, unless
liberation from the ventilator is achieved’ versus ‘continue mechanical ventilation
for another 48 hours, unless liberation from the ventilator is achieved’ address questions
around time-limited trials better than those proposed in table 1. These strategies
are sustained
7 because they specify a treatment over time, rather than simply at baseline, and
they are dynamic
8 because the treatment assigned under each strategy depends on a patient’s time-evolving
characteristics, such as respiratory status and liberation from the ventilator. Almost
all real-world treatment strategies in the ICU are sustained and dynamic, yet clinical
researchers infrequently apply the methods necessary to account for this.9
In addition to considering different treatment strategies than those in table 1, clinicians
and families are also interested in outcomes other than 30-day survival. For instance,
they may be more interested in quality of life at 6 months. Because the quality of
life at 6 months is not defined among individuals who die before the end of 6 months
of follow-up, defining a meaningful causal effect of interest requires careful handling
of competing events.10
For many clinically relevant questions, causal inference researchers have developed
the methodological tools required for computing the function of the observed data
that identify the causal effect; that is, the effect that could be directly estimated
from a perfect execution of the target trial. This occurs under assumptions about
causal structure, informed by clinical expertise.9 11 In particular, the function
depends on all of the covariates within the causal structure of the clinical scenario
that are needed for confounding adjustment. Estimating this typically high-dimensional
function of the data does, in fact, involve obtaining a form of predictions as interim
steps. For example, inverse probability weighting, which under particular assumptions
can yield causal effect estimates, requires as an interim step that the probability
of treatment conditional on the confounders be estimated. This estimate needs to be
an accurate mapping between the treatment and the confounders. These predictions are
used to construct the weights for the final causal effect estimation. Methods that
address sustained and dynamic treatment strategies may incorporate multiple predictions
into an inverse probability weighted approach to account for the time-varying nature
of real-world care. Therefore, while we have argued that prediction modelling is not,
in itself, ideal for ICU decision making, prediction algorithms are necessary interim
steps for obtaining causal effect estimates (which are the basis of such decision
making). Moreover, just as the use of modern machine learning algorithms (eg, neural
networks, random forests and gradient boosting) may perform better than traditional
models (eg, logistic regression) when the end goal is a prediction, these modern algorithms
may ultimately provide better causal effect estimates than traditional models when
used during interim steps.12–14
Conclusion
Amidst our enthusiasm to apply machine learning to ICU healthcare data, we should
remember to start with the end in mind—questions that matter to patients and families.
The process requires clinical expertise to identify specific treatment strategies
and outcomes of interest. It also entails close collaboration between clinicians,
data scientists, causal inference experts, patients and families. Rather than prediction
(which might help us identify a problem), we should estimate causal effects (which
help us understand the impact of actions we may take when faced with that problem)
by applying the tools developed by causal inference researchers over the past two
and a half decades. While machine learning plays an important role in this process,
it is relevant only after the careful mapping of a causal structure and consideration
of design elements of a target trial. In doing this, data analysis may begin to complement
the existing sound principles of EOL communication in the ICU and answer many other
important questions faced by clinicians.