Improvement Science is a well-established field, with tested and proven methods in
agriculture and other industries for over a century. Improvement Science emphasizes
rapidly testing new approaches in situ to create evidence about what changes lead
to improvement in what contexts. Knowledge gained is then used to create wider or
more substantial improvement. Execution of improvement science requires individuals
familiar with the discipline in which the work is being performed and experts in the
methods of improvement science. Improvement science is relatively new to health care
systems, which have previously only considered randomized controlled trials (RCTs)
to be the gold-standard method to create new evidence. Although RCTs create evidence
about the efficacy of therapies, Improvement Science is focused on creating evidence
about how to improve systems, which deliver those therapies. As Improvement Science
studies spread, we feel compelled to correct inaccurate messages about their proper
methodology and reporting. Mischaracterization of these proven methodologies can sabotage
the potential benefits of well-done applications of Improvement Science.
A recent editorial in a high-impact journal suggested that Improvement Science studies
(aka Quality Improvement or QI) and their reports needed improvement.
1
They indicated that studies “should have results that are generalizable,” should report
health rather than process outcomes, and should have contemporaneous control groups.
They also recommended randomization and blinding. Unfortunately, while we agree with
some of these concepts (even occasional randomization and blinding), we are concerned
that an excessively narrow view of appropriate methods for establishing generalizable
evidence regarding interventions to improve health care quality, safety, and value
was presented. In a commentary published in JAMA a decade ago, Don Berwick masterfully
deconstructed the “unhappy tension” between research meant to improve clinical evidence
and research meant to improve care processes.
2
More recently, Burke and Shojania
3
note that a strength of Improvement Science is the ability to refine the intervention
or the implementation strategy. Supported by that literature and our collective Improvement
Science experience, we suggest more appropriate directions for evaluating quality
improvement research that were not accepted for consideration of publication in the
journal containing the original editorial.
One point raised by Grady et al.
1
with which we agree is that generalizable studies are more powerful than those that
are not. This is true for both RCTs and Improvement Science using other study designs,
as both have their deficiencies when applying to broader contexts. Moreover, we agree
that single-group, pre–post designs have multiple threats to validity. A key method
by which improvement science generates generalizable knowledge is through testing
and retesting of interventions either at multiple scales in 1 context, or in multiple
contexts. This may occur through spread to other microsystems or institutions, including
study designs such as multiple-interrupted time series and stepped wedge methods (sequential
but random introduction of an intervention to clusters of individuals). Properly scaled
and implemented Plan-Do-Study-Act (PDSA) cycles can provide learning about the system
leading to improvement in process and outcome measures quickly and effectively. RCTs
are usually not well suited to answer how multiple interventions might work within
complex systems. In fact, RCTs lack generalizability when applied to complex systems.
4–6
When reporting follows the SQUIRE guidelines, discussion of intervention scale and
interaction with the environment, specific contexts, and generalizability is provided
enabling readers to learn how to apply published studies to their own microsystem
or problem.
7
We also agree that studies demonstrating outcome or value improvement provide more
useful information than studies reporting process measures alone. Furthermore, well-designed
improvement science projects should include measurement of potential adverse outcomes
as meaningful balancing measures.
However, we strongly disagree with requiring implementation research to adopt methods
from randomized controlled designs for publishing in medical journals, specifically
concurrent control groups, randomization, and blinding of results. The methods of
improvement science are as rigorous as those of RCTs, and are more appropriate to
the types of questions being asked in quality improvement work. Using PDSA cycles,
improvement science studies allow iterative change to successively improve an intervention.
Although this is not always compatible with traditional RCT designs, it is compatible
with other rigorous evaluation methods such as Shewhart statistical process control
charts. Rigorous randomized designs are appropriate for some questions in some settings,
but we object to trying to limit improvement activities to traditional RCTs. For example,
if an emergency room is inadequate at providing antibiotics to patients in septic
shock quickly, PDSA cycles may best ensure that care becomes adequate rapidly. The
knowledge gained during this process is likely useful to other emergency rooms. Moreover,
doing this in several emergency rooms and documenting similarities and differences
in approach makes the learning more generalizable. The beauty of PDSA cycles and real-time
data analysis through control charts are that intervention effects can be seen much
more rapidly than with a drawn out RCT with post hoc data analysis. In fact, an RCT
in this case could harm the half of patients not exposed to a beneficial intervention
that we know works (providing antibiotics for sepsis). For this reason, while concurrent
control groups are 1 approach to addressing important threats to validity, demonstrating
through repeated measurements that there has been a significant change from baseline
pattern is another valid approach. Because improvement science activity aims to improve
care using established evidence, and specifically because randomization is not needed
to demonstrate improvement, many IRBs realize that this activity is exempt from review,
as outlined by the Department of Health and Human Services.
8
In summary, we feel that the literature now contains an erroneous message, which has
not taken into account which study designs best answer the questions expected to be
answered by an improvement science study. As Berwick
2
stated, “‘Where is the randomized trial?’ is, for many purposes, the right question,
but for many others it is the wrong question, a myopic one.” We believe that misunderstanding
the goals, scientific basis, and methods of improvement science by some journal editors
is precisely why some prestigious journals rarely publish excellent improvement science
activity. This bias against publishing the knowledge generated by improvement scientists
does a disservice to our health care providers and their patients.
DISCLOSURE
The authors have no financial interest to declare in relation to the content of this
article.