Measuring the outcome of an intervention is central to
the practice of evidence based medicine, and most research papers
evaluating patient outcomes now incorporate some form of patient-based
metric, such as questionnaires or performance tests. Once an outcome
has been defined, researchers typically want to know if any other
factors can influence the result. This is typically assessed with regression
analysis.
Regression analysis
1
determines
the relationship of an independent variable (such as bone mineral
density) on a dependent variable (such as ageing) with the statistical assumption
that all other variables remain fixed. The calculation of the relationship results
in a theoretical straight line, and the correlation co-efficient
(r) measures how closely the observed data are to the theoretical
straight line that we have calculated.
In such a linear model, we can judge how well the line fits the
data (‘goodness of fit’) by calculating the coefficient of determination (or
square of the regression line, R2). R2 is
a measure of the percentage of total variation in the dependant
variable that is accounted for by the independent variable. An R2 of
1.0 indicates that the data perfectly fit the linear model. Any
R2 value less than 1.0 indicates that at least some variability
in the data cannot be accounted for by the model (e.g., an R2 of
0.5 indicates that 50% of the variability in the outcome data cannot
be explained by the model).
Given these statistical tools, we can use the regression equation
to predict the value of the dependent variable based on the known value
of independent variable. Since many variables may contribute to
the outcome (dependent variable), further statistical analysis can
be achieved with multiple regression analysis. These models are
essentially the same as simple regression analysis, except that
the multiple regression analysis equation describes the interrelationship
of many variables and allows us to evaluate the joint effect of
these variables on the outcome variable in question.
Poitras et al
2
report
an interesting study this month that aims to predict length of stay and
early clinical function following joint arthroplasty. Multiple linear
regression analyses produced an equation based on the timed-up-and-go
test, which was associated with length of stay. In addition, models
based on the pre-operative WOMAC function sub-score produced the
best model for describing early post-operative function (as calculated by
the Older American Resources and Services ALD score). As such the
authors were able to conclude that the outcomes assessments (timed-up-and-go
and WOMAC) were predictive of outcome, and further modelling identified
thresholds of the outcome assessment scores that related to better
and worse outcomes.
How should we interpret these findings? The authors quite correctly
suggest that models such as these could be of value in discharge
planning and resource utilisation by targeting the patients that
most need intervention and rehabilitation. The reported R2 for
the models, however, was 0.18. Bearing in mind that R2,
the coefficient of determination, measures the percentage of the
variation in the dependent variable that is explained by variation
in the independent variable,
3
taking
the compliment (100 – R2) we see that 82% of the variation
in the outcome parameter assessed is unexplained by the model. The
principal problem is that the variance in the population studied
can strongly influence R2 magnitude. Therefore, there
is no guarantee that a high coefficient of determination is indicative
of ‘goodness of fit’. Similarly there is no guarantee that a small
R2 indicates a weak relationship, given that the statistic
is largely influenced by variation in the independent variable.
4
Therefore, there is no rule for interpreting the strength of
R2 in its application to clinical relevance. Useful high values
of R2 can be obtained with clinical data sets,
5
however, a low R2 can
still provide a useful clinical model with respect to data trends,
but may be low in precision. In this study there is an association
between the performance tests and length of stay; and, using the
equations, we can indeed predict one from the other. The accuracy
of this prediction though, needs to be borne in mind when using
it as a clinical tool.
Furthermore, it is not rational to compare R2 across
different samples, which given clinical populations, are likely
to differ significantly in the variance of the independent and dependent
variables.
6
In controlled environments, such as biomechanical tests on cadaveric
bones, the variance across predictive measurements is likely to
be low, and therefore R2 values can be expected to lie
in the 0.8 range.
7
In
clinical studies, however, R2 values vary widely depending
on the nature of the analysis. For example, when comparing radiographic
parameters or associating surgical technical factors, values of
R2 are reported in the 0.2 to 0.4 range.
8,9
Whereas, comparing data between separate
(but intrinsically similar) outcome assessment questionnaires can yield
higher values in excess of 0.7.
10
As such, further validation of the Poitras study
2
using new datasets
and, ideally, confirmatory analysis of the findings using a much
larger sample size, would be required before their regression model
could be recommended for use clinically. This does not devalue the appropriateness
– or indeed ‘worthiness’ – of reporting these findings in the literature,
as the important clinical tools typically start as ideas in small
datasets. As with all research papers, the reader requires a basic
understanding of methodology to evaluate how relevant the results are
to wider practice.