1. Summary
1.1 Introduction
The breast cancer screening programmes in the United Kingdom currently invite women
aged 50–70 years for screening mammography every 3 years. Since the time the screening
programmes were established, there has been debate, at times sharply polarised, over
the magnitude of their benefit and harm, and the balance between them. The expected
major benefit is reduction in mortality from breast cancer. The major harm is overdiagnosis
and its consequences; overdiagnosis refers to the detection of cancers on screening,
which would not have become clinically apparent in the woman's lifetime in the absence
of screening.
Professor Sir Mike Richards, National Cancer Director, England, and Dr Harpal Kumar,
Chief Executive Officer of Cancer Research UK, asked Professor Sir Michael Marmot
to convene and chair an independent panel to review the evidence on benefits and harms
of breast screening in the context of the UK breast screening programmes. The panel,
authors of this report, reviewed the extensive literature and heard testimony from
experts in the field who were the main contributors to the debate.
The nature of information communicated to the public, which too has sparked debate,
was not part of the terms of reference of the panel, which are listed in Appendix
1.
1.2 Relative mortality benefit
The purpose of screening is to advance the time of diagnosis so that prognosis can
be improved by earlier intervention. A consequence of earlier diagnosis is that it
increases the apparent incidence of breast cancer in a screened population and extends
the average time from diagnosis to death, even if screening were to confer no benefit.
The appropriate measure of benefit, therefore, is reduction in mortality from breast
cancer in women offered screening compared with women not offered screening.
In the panel's judgement, the best evidence for the relative benefit of screening
on mortality reduction comes from 11 randomised controlled trials (RCTs) of breast
screening. Meta-analysis of these trials with 13 years of follow-up estimated a 20%
reduction in breast cancer mortality in women invited for screening. The relative
reduction in mortality will be higher for women actually attending screening, but
by how much is difficult to say because women who do not attend are likely to have
a different background risk. Three types of uncertainties surround this estimate of
20% reduction in breast cancer mortality. The first is statistical: the 95% confidence
interval (CI) around the relative risk (RR) reduction of 20% was 11–27%. The second
is bias: there are a number of potential sources of distortion in the trials that
have been widely discussed in the literature ranging from suboptimal randomisation
to problems in adjudicating cause of death. The third is the relevance of these old
trials to the current screening programmes. The panel acknowledged these uncertainties,
but concluded that a 20% reduction is still the most reasonable estimate of the effect
of the current UK screening programmes on breast cancer mortality. Most other reviews
of the RCTs have yielded similar estimates of relative benefit.
The RCTs were all conducted at least 20–30 years ago. More contemporary estimates
of the benefit of breast cancer screening come from observational studies. The panel
reviewed three types of observational studies. The first were ecological studies comparing
areas, or time periods, when screening programmes were and were not in place. These
have generated diverse findings, partly because of the major advances in treatment
of breast cancer, which have a demonstrably larger influence on mortality trends than
does screening, and partly because of the difficulty of excluding imbalances in other
factors that could affect breast cancer mortality. The panel did not consider these
studies helpful in estimating the effect of screening on mortality. The other two
types of studies, case–control studies and incidence-based mortality studies, showed
breast screening to confer a greater benefit than did the trials. Although these studies,
in general, attempted to control for non-comparability of screened and unscreened
women, the panel was concerned that residual bias could inflate the estimate of benefit.
However, the panel notes that these studies' findings are in the same direction as
the trials.
1.3 Absolute mortality benefit
Estimates of absolute benefit of screening have varied from one breast cancer death
avoided for 2000 women invited to screening to 1 avoided for about 100 women screened,
about a 20-fold difference. Major determinants of that large variation are the age
of women screened, and the durations of screening and follow-up. The age of the women
invited is important, as mortality from breast cancer increases markedly with age.
The panel therefore applied the relative mortality reduction of 20% to achieve the
observed cumulative absolute risk of breast cancer mortality over the ages 55–79 years
for women in the United Kingdom, assuming that women who began screening at 50 years
would gain no benefit in the first 5 years, but that the mortality reduction would
continue for 10 years after screening ended. This yielded the estimate that for every
235 women invited to screening, one breast cancer death would be prevented; correspondingly
180 women would need to be screened to prevent one breast cancer death. Uncertainties
in the figure of a 20% RR reduction would carry through to these estimates of absolute
mortality benefit. Nonetheless, the panel's estimate of benefit is in the range of
one breast cancer death prevented for ∼250 women invited, rather than the range of
1 in 2000.
1.4 Overdiagnosis
The major harm of screening considered by the panel was that of overdiagnosis. Given
the definition of an overdiagnosed cancer, either invasive or non-invasive, as one
diagnosed by screening, which would not otherwise have come to attention in the woman's
lifetime, there is need for a long follow-up to assess the frequency of overdiagnosis.
In the view of the panel, some cancers detected by screening will be overdiagnosed,
but the uncertainty surrounding the extent of overdiagnosis is greater than that for
the estimate of mortality benefit because there are few sources of reliable data.
The issue for the UK screening programmes is the magnitude of overdiagnosis in women
who have been in a screening programme from age 50 to 70, then followed for the rest
of their lives. There are no data to answer this question directly. Any estimate will
therefore be, at best, provisional.
Although the definition of an overdiagnosed case, and thus the numerator in a ratio,
is clear, the choice of denominator has been the source of further variability in
published estimates. Different studies have used: only the cancers found by screening;
cancers found during the whole screening period, both screen-detected and interval;
cancers diagnosed during the screening period and for the remainder of the women's
lifetime. The panel focused on two estimates: the first from a population perspective
using as the denominator the number of breast cancers, both invasive and ductal carcinoma
in situ (DCIS), diagnosed throughout the rest of a woman's lifetime after the age
that screening begins, and the second from the perspective of a woman invited to screening
using the total number of breast cancers diagnosed during the screening period as
the denominator.
The panel thought that the best evidence came from three RCTs that did not systematically
screen the control group at the end of the screening period and followed these women
for several more years. The frequency of overdiagnosis was of the order of 11% from
a population perspective, and about 19% from the perspective of a woman invited to
screening. Trials that included systematic screening of the control group at the end
of the active part of the trial were not considered to provide informative estimates
of the frequency of overdiagnosis.
Information from observational studies was also considered. One method that has been
used is investigation of time trends in incidence rates of breast cancer for different
age groups over the period that population screening was introduced. The published
results of these studies varied greatly and have been interpreted as providing either
reassurance or cause for alarm. So great was the variation in results that the panel
conducted an exercise by varying the assumptions and statistical methods underlying
these studies, using the same data sets; estimates of overdiagnosis rates were found
to vary across the range of 0–36% of invasive breast cancers diagnosed during the
screening period. The panel had no reason to favour one set of estimates over another,
and concluded that this method could give no reliable estimate of the extent of overdiagnosis.
Were it possible to distinguish at screening those cancers that would not otherwise
have come to attention from those that, untreated, would lead to death, the overdiagnosis
problem could be much reduced, at least in terms of unnecessary worry and treatment.
Currently this is not possible, so neither the woman nor her doctor can know whether
a screen-detected cancer is an ‘overdiagnosed' case or not. In particular, DCIS, most
often diagnosed at screening, does not inevitably equate to overdiagnosis – screen-detected
DCIS, after wide local excision (WLE) only, is associated with subsequent development
of invasive breast cancer in 10% of women within 10 years.
The consequences of overdiagnosis matter, women are turned into patients unnecessarily,
surgery and other forms of cancer treatment are undertaken, and quality of life and
psychological well being are adversely affected.
1.5 The balance of benefit and harm
The panel estimates that an invitation to breast screening delivers about a 20% reduction
in breast cancer mortality. For the UK screening programmes, this currently corresponds
to about 1300 deaths from breast cancer being prevented each year, or equivalently
about 22 000 years of life being saved. However, this benefit must be balanced against
the harms of screening, especially the risk of overdiagnosis. In the panel's view,
overdiagnosed cancers certainly occur, but the frequency in a screening programme
of 20 years duration is unknown. Estimates from trials of shorter duration suggest
overdiagnosis of about 11% as a proportion of breast cancer incidence during the screening
period and for the remainder of the woman's lifetime, or equivalently about 19% as
a proportion of cancers diagnosed during the screening period. Any excess mortality
stemming from the investigation and treatment of breast cancer is considered by the
panel to be small and considerably outweighed by the benefits of treatment. Some other
harms, including increased anxiety and discomfort caused by screening, are also acknowledged.
Notionally, for 10 000 women invited to screening, from age 50 for 20 years, it is
estimated that 681 cancers (invasive and DCIS) will be diagnosed, of which 129 will
represent overdiagnosis (using the 19% estimate of overdiagnosis) and 43 deaths from
breast cancer will be prevented.
Given that the treatment for breast cancer has improved, is screening no longer relevant?
The panel's view is that the benefits of screening and those of better treatments
are reasonably considered independent. Uncertainty about possible interaction between
the benefits of screening and of contemporary treatments is not a reason for stopping
breast screening.
The panel was not asked to comment on costs, both of interventions and the consequences
of overdiagnosis. With accurate figures an estimate of cost-benefit could be made
and compared with other interventions, but would be a significant piece of work in
its own right.
An individual woman cannot know whether she is one of the numbers who will benefit
or be harmed from screening. If she chooses to be screened, it should be in the knowledge
that she is accepting the chance of benefit, having her life extended, knowing that
there is also a risk of overdiagnosis and unnecessary treatment. Similarly, a woman
who declines the invitation to screening needs to recognise that she runs a slightly
higher risk of dying from breast cancer.
1.6 Conclusions and recommendations
Breast screening extends lives. The panel's review of the evidence on benefit – the
older RCTs, and those more recent observational studies – points to a 20% reduction
in mortality in women invited to screening. A great deal of uncertainty surrounds
this estimate, but it represents the panel's overview of the evidence. This corresponds
to one breast cancer death averted for every 235 women invited to screening for 20
years, and one death averted for every 180 women who attend screening.
The panel's best estimate is that the breast screening programmes in the United Kingdom,
inviting women aged 50–70 every 3 years, prevent about 1300 breast cancer deaths a
year, a most welcome benefit to women and to the public health.
However, there is a cost to women's well being. In addition to extending some lives
by early detection and treatment, mammographic screening detects cancers, proven to
be cancers by pathological testing, that would not have come to clinical attention
in the woman's life, were it not for screening - called overdiagnosis. The consequence
of overdiagnosis is that women have their cancer treated by surgery, radiotherapy
and medication, but neither the woman nor her doctor can know whether this particular
cancer would be one that could possibly lead to death, or one that would have remained
undetected for the rest of the woman's life.
The panel sought to estimate the level of overdiagnosis in women screened for 20 years
and followed to the end of their lives. Estimates of overdiagnosis abound, from near
to zero to 50%, but there is a paucity of reliable data to answer this question. There
has not even been agreement on how to measure overdiagnosis. On the basis of follow-up
of three RCTs, the panel estimated that in women invited to screening, about 11% of
the cancers diagnosed in their lifetime constitute overdiagnosis, and about 19% of
the cancers diagnosed during the period that women are actually in the screening programme;
but the panel emphasises these figures are the best estimates from a paucity of reliable
data.
Putting together benefit and overdiagnosis from the above figures, the panel estimates
that for 10 000 UK women invited to screening from age 50 for 20 years, about 681
cancers will be found of which 129 will represent overdiagnosis, and 43 deaths from
breast cancer will be prevented. In round terms, therefore, for each breast cancer
death prevented, about three overdiagnosed cases will be identified and treated. Of
the ∼307 000 women aged 50–52 who are invited to screening each year, just >1% would
have an overdiagnosed cancer during the next 20 years. Given the uncertainties around
the estimates, the figures quoted give a spurious impression of accuracy.
The panel concludes that the UK breast screening programmes confer significant benefit
and should continue. The greater the proportion of women who accept the invitation
to be screened, the greater is the benefit to the public health in terms of reduction
in mortality from breast cancer. However for each woman the choice is clear: on the
plus side screening confers a likely reduction in mortality from breast cancer because
of early detection and treatment. On the negative side, is the knowledge that she
has perhaps a 1% chance of having a cancer diagnosed, and treated with surgery and
other modalities, which would never have caused problems had she not been screened.
Evidence from a focus group conducted by Cancer Research UK and attended by two panel
members, and in line with previous similar studies, was that this was an offer many
women will feel is worth accepting: the treatment of overdiagnosed cancer may cause
suffering and anxiety, but that suffering is worth the gain from the potential reduction
in breast cancer mortality. Clear communication of these harms and benefits to women
is of utmost importance and goes to the heart of how a modern health system should
function. There is a body of knowledge on how women want information presented, and
this should inform the design of information to the public.
2. Introduction
2.1 The UK NHS breast screening programmes
The NHS breast cancer screening programme in England began inviting women to be screened
in 1988. This followed the recommendations made by Professor Sir Patrick Forrest in
his report on breast screening in 1986 (Forrest, 1986). The breast screening programmes
in the United Kingdom currently invite women aged 50–70 years for a screening mammography
every 3 years. The mammography is designed to detect changes in the breast tissue
that may indicate the presence of cancer. The screening programme in England is currently
conducting a randomised trial to ascertain whether there would be benefit in extending
the age at which women are invited to 47–73 years.
2.2 Principles of screening
Screening is concerned with the detection of disease at an early stage, with the expectation
that treatment will be more effective if begun earlier in the disease process. Screening
is therefore based on the principle of there being an effective treatment. It is well
recognised that an apparent benefit of increased survival time could be illusory because
of simply bringing forward the time of diagnosis without changing the course of the
disease. Therefore, the appropriate way to assess benefit is to look at breast cancer
mortality of screened and unscreened cohorts rather than just survival time from diagnosis
(see section 3).
As the principle of screening is to diagnose cases earlier, at any particular time
point during the period of successive screenings, there will be more cases of breast
cancer in a group of screened women compared with a similar group of unscreened women.
However, it is possible that some of these additional cases may be cancers that would
not otherwise have been diagnosed or caused the woman any problem during her lifetime.
These cancers are referred to as overdiagnosis (see section 4).
2.3 The debate over benefits and harms of breast screening
Since the screening programmes were established, there has been debate over the potential
benefits and harms. Recently, the debate has focussed on the reduction in mortality
attributable to screening, the numbers of women overdiagnosed, and the way that the
risks and benefits are communicated to women invited for screening. The arguments
have become quite polarised between those who believe that the benefit of decreased
breast cancer mortality outweighs the harms and those who believe the harms outweigh
the benefit. These differing views of the evidence have arisen, in part, from disagreements
over the validity and applicability of the available RCTs of breast screening, and
from questions about the usefulness and interpretation of observational data on breast
cancer incidence and mortality.
The debate over the benefits and harms of breast screening is not unique to the UK
and the NHS breast screening programmes. In 2002, the International Agency for Research
on Cancer at the World Health Organisation reviewed the evidence on breast screening,
and put forward recommendations on further research and on implementing screening
programmes (IARC, 2002). The US Preventive Services Task Force in 2009 re-examined
the efficacy of different screening modalities. They recommended that women under
the age of 50 not be routinely screened, and that women aged 50–74 have biennial rather
than annual screens (Woolf, 2010). The Canadian Taskforce on Preventative Health Care
updated their guidelines on breast screening in 2011, and concluded that the reduction
in mortality associated with screening mammography is small for women aged 40–74 years
at average risk of breast cancer. They also found a greater reduction in mortality
for women aged ⩾50 compared with those <50, and that harms of overdiagnosis and unnecessary
biopsy may be greater for younger women than for older women. They recommended that
women aged 50–74 be routinely screened but state that appreciable uncertainty exists
around the evidence for this (Canadian Task Force on Preventive Health Care, 2011).
Published reports from the Nordic Cochrane Centre concluded that, despite their substantial
methodological limitations, the trials of screening showed that screening saved lives,
but at the cost of considerable harm from overdiagnosis (Gøtzsche and Nielsen, 2011).
2.4 Breast cancer in the UK
Incidence and mortality
In the United Kingdom, breast cancer remains the most commonly diagnosed cancer in
women (48 417 cases in 2009) and is the second most common cause of death from cancer
in women (11 556 deaths in 2010). UK breast cancer incidence rates have been rising
in all age groups since the late 1970s (Figure 1A). The causes of these increasing
rates are thought to include: increased use of hormone replacement therapy; later
age at child birth; lower parity; and increasing obesity and alcohol intake in women.
Also, there is believed to be better ascertainment, especially in older women. In
common with most countries, the introduction of the screening programme for women
aged 50–64 in 1988 and those aged 65–70 in 2001 led to additional increases in incidence
(Figure 1A).
By contrast with incidence rates, since the early 1990s, the mortality rates for breast
cancer have been decreasing – shown both as annual mortality rates and 35-year cumulative
risk of dying from breast cancer (Figure 1B). It is believed that the causes of these
decreases may include: improvement in treatment, in particular adjuvant therapies;
specialisation and better organisation of cancer care; screening; and increased breast
awareness (Appendix 2).
Contribution of screening to decreased breast cancer mortality
It is widely agreed that screening alone cannot be the major factor responsible for
the decrease in breast cancer mortality over the last 20 years. Improvements in treatment
and service delivery are likely to have made the largest contribution to decreased
mortality (Berry et al, 2005). Indeed, without effective treatment, screening for
breast cancer is redundant. However, it is important to establish what contribution,
if any, screening makes, given that it requires the use of substantial resources within
the health system, and nearly two million women each year in England alone accept
the invitation and agree to be screened (The NHS Information Centre, 2012).
2.5 Independent review of breast screening
It is within this context that Professor Sir Mike Richards, National Cancer Director,
England and Dr Harpal Kumar, Chief Executive Officer of Cancer Research UK, asked
Professor Sir Michael Marmot to chair an independent panel to review breast screening.
The panel's terms of reference are shown in Appendix 1. This panel has reviewed the
extensive literature and heard testimony from many of the experts in the field. This
report details its findings and recommendations for the breast screening programme
in England.
2.6 Independent review panel membership
The independent panel consisted of nationally and internationally recognised experts
in epidemiology and/or medical statistics, as well as in current breast cancer diagnosis
and treatment practices. A patient advocate was an integral member of the panel. No
panel member had previously published on breast screening, thus helping to ensure
an objective and independent assessment of the evidence.
The panel was chaired by Professor Sir Michael G Marmot, Director of the Institute
of Health Equity, University College London; Chair, WHO Commission on Social Determinants
of Health; Chair, Marmot Review – Strategic Review of Health Inequalities in England
after 2010; Chair, European Review on the Social Determinants of Health and the Health
Divide; MRC Research Professor of Epidemiology and Public Health, University College
London with long-standing research on social determinants of health and health inequalities.
The other panellists were:
Professor Douglas G Altman, Director of the Centre for Statistics in Medicine and
Cancer Research UK Medical Statistics Group, University of Oxford. Doug's varied research
interests include the use and abuse of statistics in medical research, studies of
prognosis, regression modelling, systematic reviews, randomised trials, and studies
of medical measurement. He is actively involved in efforts to improve the quality
of scientific publications by promoting transparent and accurate reporting of health
research.
Professor David A Cameron, Clinical Director of the Edinburgh Cancer Research Centre,
Director of Cancer Services at NHS Lothian, and Professor of Oncology at Edinburgh
University. Previously, David was the Director of the NIHR National Cancer Research
Network and Professor of Oncology at Leeds University. His research interests are
in translational and clinical trials in breast cancer, and he is the principal investigator
of several clinical trials looking at treatment of early breast cancer. Before qualifying
as a medical doctor, he completed an undergraduate degree in Mathematics.
Professor John A Dewar, Consultant and honorary Professor of Clinical Oncology. Until
recently, John was Head of Oncology at Ninewells Hospital, Dundee. John has a long-standing
interest in the management of patients with breast cancer and has been closely involved
in clinical trials of both radiotherapy and systemic therapy for breast cancer.
Professor Simon G Thompson, Director of Research in Biostatistics at the University
of Cambridge. Simon's research interests are in meta-analysis and evidence synthesis,
clinical trial methodology, health economic evaluation, and cardiovascular epidemiology.
He has collaborated on a number of major clinical trials, recently including all the
major UK national trials of screening and treatment for abdominal aortic aneurysms.
Maggie Wilcox, patient advocate. Maggie was a health visitor for many years before
working as Clinical Nurse Specialist in palliative care before her breast cancer diagnosis
in 1997. After early retirement following her treatment, she became involved in patient
advocacy in cancer services and research. She now provides a patient voice at national
and local level as a member of various organisations, including the National Cancer
Research Institute Breast Clinical Study Group and the Surrey, West Sussex and Hampshire
Network Breast Site Specific Group.
2.7 Independent review process and role of secretariat
As set out in the review's terms of reference, the secretariat provided initial key
literature on breast cancer screening, including publications recommended from both
sides of the debate. The panel then called on a range of experts (see Appendix 1 for
full list) to give evidence.
Cancer Research UK and the Department of Health provided the secretariat function
for the review comprising:
Dr Dulcie McBride, Consultant in Public Health Medicine, Department of Health
Sara Hiom, Director of Information, Cancer Research UK
Nick Ormiston-Smith, Data Analysis and Research Manager, Cancer Research UK
Dr Martine Bomb, Programme Manager, Cancer Research UK
Samantha Harrison, Programme Officer, Cancer Research UK
The secretariat acted purely as support to the panel in the practical, writing, and
dissemination functions and having no say in the conclusions or recommendations. Further
information can be found in Appendix 1.
3. The effect of breast screening on mortality
This section summarises the panel's views of the effect of breast screening on mortality.
Specifically, the aim is to estimate the effect of the current national screening
programmes in the United Kingdom on breast cancer mortality. Estimates of relative
risk reduction, absolute risk reduction, and increase in life expectancy are discussed.
3.1 Introduction
Randomised controlled trials potentially provide the most reliable information about
the effects of breast screening. Well-conducted RCTs are prone to fewer distorting
effects, or biases, than observational studies. Systematic reviews and meta-analyses
of RCTs are widely accepted as the highest level of evidence for guiding policy decisions
on medical interventions. For this reason, our quantitative estimate of the benefits
of breast screening comes from the randomised trials of breast screening. Given the
wealth of observational studies on this issue, in section 3.6 we look to observational
studies as a possible guide to more contemporary estimates of the effects of screening
on mortality.
Randomised controlled trials, however, are not without their problems in practice.
Lack of internal validity, for example, through failures in proper randomisation,
losses to follow-up and misclassification of end points, can lead to biased estimates
of effects. Differences between the trials and the current UK context, for example,
in the type of screening undertaken or in the length of follow-up, lead to a lack
of external validity. Both the internal and external validity of the RCTs of breast
screening have been widely discussed.
A specific issue raised by some commentators is that most of the randomised trials
of breast screening date from the 1980s or earlier. Treatment and overall management
of breast cancer have improved considerably since that time. Are the trials still
relevant? Such a question can be asked of any area of medical investigation and treatment;
trials refer to the past and our use of interventions relates to the future. It is
an important area of judgement and one that the panel kept at the forefront of its
consideration.
The purpose of screening is to prolong survival, but length of survival from diagnosis
of breast cancer to death cannot be used as an end point in the RCTs, because the
cancers diagnosed by screening are diagnosed earlier than those diagnosed without
screening. Thus, even in the absence of any therapy, a cancer diagnosed earlier by
screening will have a better survival than the same cancer presenting later symptomatically.
Mortality after invitation to screening is the appropriate end point. However, concerns
have been raised about the use of breast cancer mortality. If the adjudication of
a death as due to breast cancer is influenced by the woman's screening history, then
the estimate of the effects on breast cancer mortality can become biased. For this
reason, some have argued that death from all cancers, or indeed all-cause mortality,
should be the primary outcome of interest in the trials. The panel disagrees with
this view (section 3.5). We also comment on the estimation of absolute risk differences,
as opposed to RRs, and the difference between the effects expressed per woman invited
and per woman screened.
The panel's view is that although the trials are far from perfect, they offer the
most reliable evidence on the RR reduction in breast cancer mortality to be derived
from screening.
3.2 Available randomised trials
Eleven randomised trials have been undertaken and reported (New York health insurance
plan (HIP), Malmö I and II, Swedish Two County (Kopparberg and Östergötland), Canada
I and II, Stockholm, Göteborg, UK Age trial, and Edinburgh; Table 1). The three trials
with two parts have sometimes but not always been reported separately in publications.
Three other randomised trials are mentioned in the Cochrane Review (Gøtzsche and Nielsen,
2011), but were excluded because they compared multiple interventions (not just mammography),
or made major post-randomisation exclusions. We also exclude these three studies from
our assessment.
All the trials compared women invited to screening with a control group not invited.
However, they varied considerably, for example, in terms of the method of randomisation,
age group of women invited, type of mammography employed, whether physical examination
or self-examination was also used in either the invited or control groups, interval
between screens, number of screens, length of follow-up, and system used for adjudicating
breast cancer deaths (Table 1).
Randomisation
The invited and control groups in the trials were constructed either by randomising
individuals, or by randomising clusters (geographical areas or general practices),
or by allocation according to day of birth. Individual randomisation, with adequate
allocation concealment, is rightly regarded as the most reliable method. For population
screening studies, however, cluster randomisation can also be adequate, provided sufficient
clusters are randomised and balance in social and other characteristics is achieved.
Women are identified through existing registers, and so it is unlikely that participation
bias, which afflicts some cluster trials (Puffer et al, 2003), would apply (for example,
through women moving between areas in order to avoid or obtain an invitation to breast
screening). Similarly, using allocation by day of birth would seem to be adequate
for population screening trials. Of the trials considered, the Edinburgh trial suffered
the most problems in terms of its cluster randomisation (Gøtzsche and Nielsen, 2011),
with some re-allocations and post-randomisation exclusions of clusters, which led
to severe baseline imbalances (26% of women in the control group and 53% in the invited
group were in the highest socioeconomic group). For this reason, like the Cochrane
Review, we exclude the Edinburgh trial from our main summary and comment on its results
separately (section 3.5).
Age
The trials recruited women of different ages (Table 1). Most overlapped extensively
with the age group 50–70 years, relevant to the UK programmes, but some (e.g. UK Age
trial, Malmö II) did not. We base our primary conclusions about RR on all the trials,
as this appears fairly constant across age groups (Nyström et al, 2002). There is
some evidence, however, that the RR may be attenuated in women under age 50 (Canadian
Task Force on Preventive Health Care, 2011), so we also consider an analysis that
excludes these women.
Duration of follow-up
Even in the pre-screening era, the median survival from diagnosis of breast cancer
was several years, so any benefits of screening in terms of mortality are not immediate,
but will accrue over time. So the best evidence would come from a trial with a long
duration of follow-up, comparing the invited group with a control group who are never
invited to screening. The data that come nearest to this are for the age group 55–69
in Malmö I, with a follow-up of 19 years. Most of the trials, however, started systematic
screening of the control group after 4–10 years. Little effect on mortality is seen
within the first 5 years of screening, so we regard a follow-up period of about 10–15
years after randomisation as providing the most reliable estimate of the RR. A shorter
follow-up time would put too much weight on the early period after initial screening,
whereas a longer period would include a greater diluting effect of screening in the
control group. So we base our primary conclusions about breast cancer mortality on
the data reported in the Cochrane Review, which provided results for 13 years of follow-up
of the groups as randomised (Gøtzsche and Nielsen, 2011).
Adjudicating cause of death
Potential biases from classifying cause of death have been a major source of contention,
especially in the Swedish trials. Ascribing a death as primarily due to breast cancer,
or not due to breast cancer, is not always easy or reliable. So, when the screening
history of a woman is known, or when a prior diagnosis of breast cancer has been made,
this could influence the adjudicated cause of death. There are two ways in which this
could distort the results of the trials. The first is overt bias, in which investigators
closely involved with the trial adjudicate cause of death and tend to avoid ascribing
the cause of death as breast cancer when the woman has been screened (and conversely
if they had not). This would exaggerate any beneficial effect of screening. This bias
(which may be subconscious) is avoided by the use of an independent end point committee
to ascribe causes of death, or by the use of death certificates from national registries.
These methods however do not avoid a second way in which a trial's results might be
affected; screening increases the number of breast cancers diagnosed, and such a diagnosis
may lead preferentially to classifying a subsequent death as due to breast cancer
rather than any other cause. This second bias operates against any beneficial effect
of screening.
Most trials used an independent end point committee to adjudicate causes of death
or took the underlying cause of death from national registries (Table 1). Some of
the Swedish trials were criticised for using trial investigators to ascribe cause
of death, but subsequent evaluations were made using independent and consensus committees
and national registry statistics (Nyström et al, 2002; Tábar et al, 2011). Although
the exact numbers of deaths from breast cancer were not the same when adjudication
was made using different methods, the overall estimates of RR of breast cancer mortality
did not change very much. Thus, although this issue is certainly one of the major
criticisms of the trials, the panel does not think it would exaggerate the estimates
of RR reduction obtained from individual trials, or indeed from a meta-analysis of
trials. We comment on the use of other mortality end points in section 3.5.
Other issues
Many other aspects of the trials have been discussed in the literature, some of which
we mention here. The numbers of women reported in each randomised group have not been
identical across the multiple publications from certain trials. Although this is somewhat
concerning, it is perhaps not surprising, given that population and other registers
are not always fully reliable, and data checks over time reveal duplicates and other
problems. Moreover, some publications are based on birth cohorts and others on exact
age groups (Nyström et al, 2002). The trials report excluding women with a prior diagnosis
of breast cancer. Although this is sensible, it can lead to problems if the exclusions
are more easily made in the invited group (for example, because of more information
obtained at screening) than in the control group. Some trials include physical examination
or self-examination in either or both of the randomised groups. However, there is
no evidence that these procedures influence breast cancer mortality (Canadian Task
Force on Preventive Health Care, 2011).
Conclusion
We acknowledge the problems and biases discussed above, but judge them as unlikely
to have had a major distorting effect on the overall result from a meta-analysis of
the trials. Moreover, the biases considered do not all operate in the same direction,
with some favouring screening and some acting against it. Although it is easy to be
critical of many detailed aspects of the breast screening trials, the relevant judgement
is whether the biases are so great as to make their results too misleading for guiding
policy. The panel does not believe this to be the case, especially in contrast to
the problems in interpreting the results from observational studies (section 3.6).
3.3 Meta-analysis of RRs
As discussed above, we focus on the deaths ascribed to breast cancer in 10 of the
11 randomised trials (excluding Edinburgh) and the meta-analysis conducted in the
Cochrane Review, using 13 years of follow-up (analysis 1.2 in Gøtzsche and Nielsen,
2011). We do not distinguish the trials labelled ‘adequately randomised' and ‘sub-optimally
randomised' in the Cochrane Review, but consider the totality of evidence across all
the trials. We also use random-effects rather than fixed-effect meta-analysis to estimate
an average effect across the trials. Using random effects acknowledges that the trials
may be estimating different quantities, which is likely given their clinical heterogeneity,
whereas a fixed-effect analysis estimates an assumed common effect across all the
trials. The results are shown in Figure 2 along with the RRs of breast cancer mortality.
The overall RR, comparing invited with control women, is 0.80 (95% CI 0.73–0.89).
There was some heterogeneity in the RRs from different trials, but this was not statistically
significant (Figure 2). Thus, the RR reduction in breast cancer mortality in the groups
invited to screening is estimated as 20% (95% CI 11–27%).
The RR for women invited to screening is attenuated compared with that for women who
actually attend screening (Cuzick et al, 1997). This is because some invited women
do not attend, and they may be assumed to get no benefit from the invitation. If the
underlying rate of breast cancer mortality in non-attenders is the same as in attenders,
one may estimate the RR reduction in attenders as the RR reduction in those invited
divided by the (average) attendance rate. Taking the typical attendance in the trials
as about 80% (Table 1), this would give 20% divided by 0.80, or 25%). However, this
calculation is incorrect as the underlying risk is different in those not attending
screening (Zackrisson et al, 2004; Moss et al, 2006). Without this extra information,
which is not available for all trials, the calculation of the RR reduction in those
attending screening is not possible. In contrast, the calculation can be made, irrespective
of underlying risk differences, for the absolute risk reduction (section 3.4). We
note that the coverage rate in the UK NHS screening programme is similar to that in
the trials, at 77% (The NHS Information Centre). Some non-systematic (opportunistic)
screening occurred in the control groups of the trials, but detailed information is
not available. This is ignored in our calculations, and will lead to the effect of
attending screening being somewhat underestimated.
Other estimates of overall RR
Other meta-analyses of the breast cancer screening trials have given different estimates
of the RR reduction. We summarise some of these below.
The Cochrane Review undertook a fixed-effect meta-analysis of the above trials with
13 years follow-up, and reported an estimated RR of 0.81 (95% CI 0.74–0.87). As expected,
the fixed-effect analysis gives a slightly narrower CI, but the estimated average
RR reduction of 19% is similar to the figure of 20% above.
If women <50 years in the above trials are excluded, the overall RR reported in the
Cochrane Review (analysis 1.6, Gøtzsche and Nielsen, 2011) is 0.77 (95% CI 0.69–0.86).
So the RR reduction is estimated as 23%, slightly more than the 20% above based on
all age groups.
The Cochrane Review (Gøtzsche and Nielsen, 2011) focused on the Canada, Malmö, and
UK Age trials as the only ‘adequately randomised' trials. The estimated RR of breast
cancer mortality over 13 years follow-up for invited vs control groups in these trials
was 0.90 (95% CI 0.79–1.02), whereas in the trials considered ‘sub-optimally randomised'
it was 0.75 (0.67–0.83). As a compromise between these two estimates, the authors
concluded that a 15% RR reduction was plausible.
The US Task Force (Nelson et al, 2009) provided estimated RRs of breast cancer mortality
of 0.86 (95% CI 0.75–0.99) for women aged 50–59 years invited to screening, and of
0.68 (95% CI 0.54–0.87) for those aged 60–69 years. These correspond to RR reductions
of 14% and 32%, respectively, with an inverse variance weighted average of 19%.
The Canadian Task Force (Canadian Task Force on Preventive Health Care, 2011) gave
an estimate of the RR of breast cancer mortality for invited vs control groups of
0.79 (95% CI 0.68–0.90) for women aged 50–69 years, a RR reduction of 21%. Routinely
screening for breast cancer with mammography every 2–3 years for this age group was
rated as a weak recommendation, based on moderate-quality evidence according to GRADE
criteria (Schünemann et al, 2011).
A review by Duffy et al (2012) of all the trials and age groups gave an overall RR
of 0.79 (95% CI 0.73–0.86) comparing invited with control groups, corresponding to
a 21% RR reduction in breast cancer mortality.
Different meta-analyses include different trials, durations of follow-up, and definitions
of outcome. Nevertheless, there is general agreement in their estimates, of about
a 20% RR reduction in breast cancer mortality from invitation to screening.
Generalisability of RRs
A key issue is whether the RR reduction in breast cancer mortality observed in the
trials may be taken as applying, at least approximately, to the current UK screening
programmes. This is a judgement about external validity, rather than an issue for
which much direct empirical evidence is available. As always in policy decision making,
we need to use evidence from studies undertaken in the past to make an inference about
what is likely in the future. Although RRs are often much more generalisable across
contexts than absolute risk differences, it is clearly plausible that RRs could change
in new situations. Of particular concern in breast screening is that many of the trials
were undertaken a long time ago, that the techniques of mammography have changed considerably,
that DCIS is now commonly diagnosed through screening (section 4.6), that the treatments
for breast cancer, particularly the drug treatment that can eradicate microscopic
spread, have become more effective, and that the overall mortality rate from breast
cancer has decreased in the United Kingdom and other countries. These points were
put to the panel by some expert witnesses. One could therefore argue that breast screening
is now less effective/relevant because even later stage cancers can be treated and/or
cured, so there is less need to diagnose breast cancers earlier. However, there is
a counter argument that because the systemic drug treatments are only partially effective,
it could be that the major improvements that drug treatments have brought in cure
rates are in fact in part due to breast screening: by diagnosing more cancers at an
earlier stage, contemporary drug treatments have a better chance of eradicating microscopic
disease, and thus the gains in survival would not have been as great if breast screening
did not exist.
Both views have some supporting arguments, but the panel found no convincing evidence
that one or other was more likely to be correct. Thus, the panel's view is that the
appropriate manner in which to view the benefits of screening and those of better
treatments are that these effects are independent, and thus that the estimates of
the relative reduction in breast cancer mortality achieved with screening are the
same now as 20 years ago. However, the uncertainty about whether there could be an
interaction between the benefits of screening and of contemporary treatments is not
a reason for stopping breast screening.
Particular aspects for which there is at least some evidence about the external validity
of the trials relate to age, screening intensity, and follow-up time. The RR does
not appear to change much across the age range 50–69 years (Nyström et al, 2002),
but it may be reduced below the age of 50 (Canadian Task Force on Preventive Health
Care, 2011). The RR does not appear to depend strongly on the number of screens, or
the screening interval, at least across the ranges studied in the trials. The only
randomised trial that compared different screening intervals is inconclusive (Breast
Screening Frequency Trial Group, 2002). Reports from trials with long follow-up suggest
that little benefit in terms of breast cancer mortality is seen in the first 5 years
after starting screening, and that the benefit lasts for at least 10 years after cessation
of screening. This is not surprising, given the slow progression rates of many breast
cancers.
Conclusion
The panel concludes that the current screening programmes in the United Kingdom, which
invite women aged 50–70 every 3 years to undergo mammography, are likely to deliver
about a 20% reduction in breast cancer mortality at ages 55–79 years. Clearly, there
is uncertainty in this figure. In addition to the uncertainty owing to the limited
numbers of breast cancer deaths across the trials, there are potential biases in the
trials and concerns about the generalisability of results from the trials to the current
UK screening programmes. We note, however, that the level of disagreement in the literature
about the RR reduction is minor in comparison to the controversy about the absolute
risk reduction.
3.4 Absolute risk reduction
The above discussion suggests a natural way to estimate the absolute risk reduction
that applies to the current screening programmes in the United Kingdom. For women
aged 50 invited to screening, we assume no benefit in breast cancer mortality until
age 55, a 20% reduction at ages 55–79, and no change in the rates of other causes
of death. An estimated 1.70% of UK women aged 50 are currently expected to die from
breast cancer between the ages of 55 and 79; this is calculated from UK mortality
rates (2008–2010) and takes into account the risks of dying from other causes. Since
the UK programme has existed since the late 1980s, one may assume that this risk has
already been reduced by 20% through screening. Hence, the risk without the screening
programme would have been 2.13% (as 1.70/2.13=0.80), and the estimated absolute risk
reduction is 2.13−1.70=0.43%.
The number of women needed to be invited for screening for 20 years starting at age
50 in order to prevent one death from breast cancer is therefore 1/0.43%=235. An alternative
way of expressing this is that, for every 10 000 women invited into the screening
programme at age 50, about 43 deaths from breast cancer would be prevented.
The absolute risk reduction for women attending screening can be estimated as the
absolute risk reduction in those invited divided by the average coverage rate in the
NHS breast screening programme (77%), so about 0.43%/0.77=0.56%. The number of women
needed to be screened for 20 years to prevent one death from breast cancer is then
1/0.56%=180. For every 10 000 women attending screening from age 50–70 years, about
56 deaths from breast cancer would be prevented.
The above calculations are based on the same principles as those used in some publications
(Advisory Committee on Breast Cancer Screening, 2006). Essentially, the RR reduction
from the trials is regarded as approximately generalisable to the current UK screening
programmes, and the corresponding absolute risk reduction is calculated by applying
this RR reduction to the national rates of breast cancer mortality for an appropriate
age group. The considerable uncertainty in the estimated RR reduction of 20%, as discussed
in section 3.3, of course carries through to these estimates of absolute risk reduction.
The NHS screening programme estimates that 1400 lives are saved per year in England
owing to breast screening (Advisory Committee on Breast Cancer Screening, 2006). For
comparison and illustrative purposes, the panel estimates that for the 307 000 women
(aged 50–52) who each year receive their first invitation to a 20-year screening programme
(3-year average 2008/2009–2010/2011, The NHS Information Centre), 0.43% of 307 000,
or about 1300, deaths from breast cancer per year are prevented. This is close to
the NHS screening programme's estimate.
Different methods and estimates in the literature
The marked difference in estimates of absolute risk reduction proposed in the literature
is one of the greatest sources of controversy about the value of breast cancer screening
(McPherson, 2010). The different estimates stem from the very varied methods used
for their calculation. When calculations are made directly from the trials' data themselves,
the absolute risk reduction depends overwhelmingly on the underlying risk of breast
cancer, which is principally governed by the age groups considered, the length of
follow-up, and the population studied. Although this is obvious, it has also been
empirically shown by comparing different durations of follow-up in the Swedish Two
County trial (Tábar et al, 2011).
The Cochrane Review (Gøtzsche and Nielsen, 2011) focused on the Canada, Malmö, and
UK Age trials as the only ‘adequately randomised' trials. The absolute risk of breast
cancer death in the control groups of these trials was low (overall rate of 0.33%),
partly because of the inclusion of the large UK Age trial (women initially aged 39–41)
and the 13-year follow-up period considered rather than the 25-year period from age
55–79, used above by the panel. With the Cochrane Review authors' estimated 15% RR
reduction, this leads to an estimated absolute risk reduction of 0.05%, or equivalently
that 2000 women need to be invited to screening to prevent one breast cancer death.
An entirely different estimate is given by Duffy et al (2010) based on 22 years of
follow-up for those aged 50–69 in the Swedish Two County trial, which estimated a
38% reduction in breast cancer mortality. The calculation considers the absolute risk
reduction per women screened across the 7 years of screening in the trial, and makes
the strong assumption that the absolute benefits can simply be multiplied up to reflect
the 20 years of screening in the UK programmes. This leads to an estimated absolute
risk reduction of 0.88% in women screened, or equivalently that 113 women need to
be screened to prevent one breast cancer death.
The US Task Force (Nelson et al, 2009) considered a period of 7 years of invitation
to screening and 13 years of follow-up after first invitation (Nelson et al, 2009).
For ages 50–59 years, they estimated that 1339 women needed to be invited to prevent
one death from breast cancer. For ages 60–69 years, their corresponding estimate was
377 women.
The Canadian Task Force (Canadian Task Force on Preventive Health Care, 2011) estimated
from the trials that screening 720 women aged 50–69 years once every 2–3 years for
about 11 years would prevent one death from breast cancer.
Beral et al (2011) summarised various published estimates of absolute risk reduction
from the literature, and concluded that around one breast cancer death would be prevented
in the long term for every 400 women aged 50–70 years regularly screened over a 10-year
period, based on a previous review (Advisory Committee on Breast Cancer Screening,
2006).
From the above examples, it is clear that different methods of estimation give about
a 20-fold difference in the estimates of absolute risk reduction. The panel's view
is that to estimate the impact of the UK screening programmes on absolute risk of
dying of breast cancer, it is necessary to consider the relevant underlying risk of
breast cancer to which the RR reduction from the trials should apply. The panel believes
this is best derived from the current UK national rate of breast cancer deaths for
women aged 55–79 years. Calculations made directly from the absolute risks observed
in the trials are heavily, and often misleadingly, influenced by the age groups included
and the length of follow-up available (Beral et al, 2011). Estimates also depend on
whether they are expressed per woman invited or per woman screened. We note, however,
to the extent that the absolute rate of breast cancer mortality in the United Kingdom
is currently declining, the absolute risk reduction from the UK screening programme
would also be expected to decline correspondingly in the future.
Life expectancy gained
A reduction in the risk of breast cancer will lead to an increase in life expectancy.
As breast cancer is only one of many causes of death, the average gain in life expectancy
from the UK screening programme is likely to appear modest. An estimate can easily
be derived by contrasting the life expectancy for women aged 50, using current national
rates of breast cancer mortality and deaths from other causes, to that which would
apply if the rates of breast cancer mortality were 25% higher in each year from age
55–79 years. (25% higher corresponds to the assumed 20% benefit from screening, as
1.25=1/0.80.) This calculation leads to an estimate of 0.073 years (or 27 days) of
life gained on average for each woman aged 50 invited to screening. To put this in
perspective, the panel noted that abolition of all deaths from breast cancer completely
would add 159 days on average to life expectancy for women aged 50.
We also note that this is a crude average of a zero gain for the vast majority of
women and a substantial gain for a few. Alternative but equivalent ways of expressing
this gain are as follows: (a) for the 307 000 women aged 50–52 who are invited for
screening each year, about 22 000 years of life will be saved; (b) for each 10 000
women invited to screening, 730 years of life will be saved; (c) for each 10 000 attending
screening, about 950 years of life will be saved; (d) given that 1 in about 180 women
attending screening avoid breast cancer death, such a woman would expect to gain on
average an extra 17 years of life.
3.5 Other considerations
Edinburgh trial
The Edinburgh trial was the only UK trial in an age group that is within that of the
national screening programme. However, as discussed in section 3.2, we excluded this
trial because problems in the cluster randomisation led to a severe imbalance in socioeconomic
status of the women between the groups, and socioeconomic status influences, in opposite
directions, the risk of developing breast cancer and of dying from breast cancer.
At 14 years of follow-up, the unadjusted results showed a 13% reduction in breast
cancer mortality. However, on adjusting for socioeconomic status, the rate ratio was
0.79 (95% CI 0.60–1.02), a RR reduction of 21% (Alexander et al, 1999). Thus, although
doubts must remain about the validity of this latter estimate, we note that it very
much in line with the figure of 20% we have used above.
Other outcomes
In the preceding sections, we have focused exclusively on breast cancer mortality.
Owing to the concerns about whether such deaths are reliably adjudicated in the trials,
some authors have suggested that this has led to exaggerated estimates of the RR reduction,
and that the outcomes of death from any cancer, or death from any cause, are the appropriate
ones for judging the impact of breast screening on mortality. The panel disagrees
with this: evaluating all-cancer or all-cause deaths in the trials will lack power
because breast cancer deaths represent only a small proportion within these categories.
In particular, a 20% RR reduction in breast cancer deaths for ages 55–79 years would
yield only 3.0% and 1.2% RR reductions in all-cancer and all-cause deaths, respectively.
The trials are not of sufficient size (in terms of numbers of women and length of
follow-up) to allow such small RR reductions to be reliably estimated. Hence, a statistically
non-significant effect for all-cancer or all-cause deaths in the trials cannot be
interpreted as evidence against a reduction in breast cancer deaths.
Some authors have argued that changes in the incidence of more advanced breast cancer,
whether defined as above a certain tumour size or with spread to the ipsilateral axillary
nodes, is a useful surrogate indicator of the effect of screening on breast cancer
mortality in the trials, as the ultimate risk of dying of breast cancer depends in
part on the stage of disease at first presentation. Although, on average, one could
expect a breast cancer screening programme to lead to diagnosis of breast cancers
at an earlier stage, this approach cannot, however, directly exclude lead time effects.
The situation is further complicated by the issue of interval cancers, which have
been shown in more than one study, as compared with screen-detected cancers, to be
more often high grade, which is itself predictive of a poorer prognosis. However,
what is less clear is whether the prognosis of a breast cancer is determined only
by the stage when diagnosed, or whether in the absence of a screening programme the
underlying biology is the main determinant of outcome, and this in turn influences
when the cancers present. Thus, for those cancers diagnosed earlier by screening,
it is not clear which, if any, of the clinical markers of prognosis (stage, size,
grade etc.) is the best predictor of ultimate outcome; or is it some other fundamental
characteristic only assessable by molecular biology?
Therefore, there appears to be little reason to use these surrogate outcomes as evidence
for or against the benefits of screening, as substantial assumptions are needed to
estimate the consequent effect on breast cancer mortality. Only if one wanted to disregard
completely the evidence about breast cancer mortality from the trials, would the use
of such surrogate outcomes have value.
There are possibilities of specific harms of screening in terms of induction of other
cancers through the X–rays used in mammography or the radiotherapy or drug therapy
used to treat breast cancer, and of coronary damage and deaths through radiotherapy
(especially of the left breast). These potential harms are discussed in section 5.2.
Statistical and other uncertainties
It is conventional that results from statistical analyses, including meta-analyses,
are presented with a measure of statistical uncertainty such as 95% confidence limits.
Although these are helpful in giving an impression of the possible influence of the
play of chance (given the sample sizes that are available in the studies considered),
they fail to represent the uncertainties because of possible biases (from lack of
internal validity of the studies) or owing to generalisation from the trials to a
new context (external validity). So, the CI given for the RR reduction of breast cancer
mortality from a meta-analysis of the trials is an understatement of the uncertainty
about the RR reduction that applies to the UK screening programmes. A RR reduction
of 20% represents the panel's judgement of the evidence, and should be regarded as
an approximate figure rather than a precise estimate.
3.6 Observational studies
In addition to the trials, the panel also considered the value of observational studies
in estimating the impact of screening on breast cancer mortality. The RCTs of mammographic
screening were conducted at least 20 years ago and most over 30 years ago. Observational
studies may help to quantify the effects of screening in an era with major improvements
in diagnostic imaging, clinical care, and patient outcomes, as many of the observational
studies are more recent than the trials. Both proponents and critics of screening
have suggested that the observational studies are more relevant today than the RCTs.
However, these studies are beset by many more biases with consequent problems of interpretation.
It is also possible that they are more prone to selective reporting than trials, in
that the results obtained determine the enthusiasm of the authors and journals for
publication.
The biases inherent in observational studies differ by type of study. All share the
common problem of potential lack of comparability of screened and unscreened women.
It is this feature that the RCTs are designed to address. Each observational study
design has strengths and weaknesses and, within each class, specific studies vary
in their methods and credibility. The relative merits and problems of the various
observational study designs are hotly contested both in the literature and in the
evidence the panel heard.
Ecological and time-trend studies
Some observational studies compare time trends for breast cancer mortality in countries
or areas before and after the introduction of screening, or concurrently between areas
with and without screening. In the first type of study, extrapolation of time trends
demands that decisions are made, for example, about the linearity or otherwise of
the trend, the choice of time periods considered as ‘before' and ‘after' screening,
and the age groups included. In the second type of study, choices have to be made
about the areas to include, the time period considered, and the age groups included.
Such decisions, which can appear to have been made rather arbitrarily, can have a
profound impact on the estimates obtained. Lack of comparability and different time
trends in the groups being contrasted could lead to substantial bias. For these reasons
the panel does not consider that these types of studies provide reliable evidence
on the effect of screening on breast cancer mortality, and amongst observational study
designs we focus instead on case–control studies and incidence-based mortality studies.
Case–control studies
Case–control studies compare the history of breast screening attendance between women
dying of breast cancer and control women who did not die of breast cancer. Case–control
studies are prone to a number of potential biases. The main problem with case–control
studies is that those attending breast screening are different from those who do not
attend. This is referred to as self-selection bias or the ‘healthy screened effect'.
Attendance is influenced by social and demographical factors that are also likely
to be related to the risk of dying from breast cancer, with the resulting bias potentially
exaggerating the estimated effect of screening. Also, the existence of a breast screening
programme in an area may be associated with better treatment of breast cancer. Therefore,
women diagnosed with breast cancer in an area with a breast screening programme may
also receive more effective treatment than women where there is no such programme.
This would bias the study in favour of screening. Attempts are made to correct for
the resulting biases by choice of controls and statistical adjustment (Connor et al,
2000; Duffy et al, 2002).
Some of the expert witnesses who gave evidence to the panel felt that case–control
studies provided the most reliable form of observational data while others believed
the opposite. The panel undertook a review of the individual characteristics of a
number of case–control studies to assess the potential bias of each one (Appendix
3). In general, the studies matched controls to cases by both age and residence but
some matched on just one of these variables. Self-selection bias was discussed in
around three-quarters of the studies and statistically controlled for, using a variety
of methods, in less than half of the studies (Appendix 3).
The case–control studies show more favourable benefit of screening compared with the
trials. The panel believes that this is plausibly because of inadequate control for
self-selection bias rather than in screening actually being far more beneficial now
than in the trials. Attempts to correct for self-selection bias were based on information
outside of the study itself (either from a previous time period, or from other geographical
areas) that may not be fully relevant. When adjustment was made, the apparent benefit
of screening was diminished. The bias that screening could be associated with better
treatment was controlled for studies conducted in countries with uniform treatment
services.
In conclusion, the panel notes that the beneficial effects of screening are in the
same direction as those seen in the trials, but that control for self-selection bias
may be inadequate in many of the studies.
Incidence-based mortality studies
Njor et al (2012) conducted a review of European studies on the impact of service
mammography screening on breast cancer mortality using incidence-based mortality.
In these studies, only breast cancer deaths occurring in women with breast cancer
diagnosed after their first invitation to screening are included. They classified
the studies according to type of comparison group. These were (1) women not yet invited,
(2) historical data from the same region as well as from historical and current data
from a region without screening, and (3) historical comparison group combined with
data for non-participants.
They found that the effect of screening on breast cancer mortality varied across studies.
The RRs were 0.76–0.81 in group 1; 0.75–0.90 in group 2; and 0.52–0.89 in group 3.
Study databases overlapped in both Swedish and Finnish studies, adjustment for lead
time was not optimal in all studies, and some studies had various other methodological
limitations. There was less variability in the RRs after allowing for the methodological
shortcomings. On the basis of evidence from the most reliable incidence-based mortality
studies, they concluded that the most likely impact of European breast screening programmes
was a breast cancer mortality reduction of 26% (95% CI 13–36%) among women invited
for screening and followed up for 6–11 years.
Conclusion
Many observational studies have been published, and their conclusions hotly contested.
In general, the more contemporaneous case–control and incidence-based mortality studies
support the evidence from the trials that screening does have a beneficial effect
on mortality. The panel's view is that the trials provide more reliable evidence for
an estimate of mortality reduction. Nevertheless, the observational studies support
the hypothesis that screening continues to be beneficial in an era of improved treatment.
4. Overdiagnosis
4.1 Introduction
The purpose of breast screening is to detect cancer early, before it has come to clinical
attention. If all cancers would eventually be clinically recognised and treatment
was the same and equally effective no matter when the tumour was diagnosed, then screening
would be redundant. However, the understanding is that if the cancer is diagnosed
earlier, then treatment will be more effective. This is the assumption on which screening
is based. The evidence reviewed in section 3 supports that assumption.
As cancers are detected earlier because of screening, we expect the cancer incidence
to be higher among screened women during the screening period (the time period between
the detection of a cancer at screening and when it would have presented clinically
is the ‘lead time' and is an inevitable part of screening). In principle, when screening
ceases the incidence should fall back so that by the end of the screening period plus
lead time, the cumulative incidence in the screened and control populations should
be the same.
Some screen-detected cancers, however, may never progress to become symptomatic (clinically
detectable) while some women would die from another cause before the cancer became
evident. This adverse consequence (harm) of screening is called overdiagnosis or overdetection.
It is variously defined as the ‘detection of cancers on screening that would not have
been found were it not for the screening test' (IARC, 2002), or ‘that would never
have clinically surfaced in the absence of screening' (Seigneurin et al, 2011) or
‘that would not have presented clinically during the woman's lifetime (and therefore
would not have been diagnosed in the absence of screening)' (Biesheuvel et al, 2007).
Thus, it refers to all cancers, invasive or in situ.
Underpinning the concept of overdiagnosis is the belief that cancers grow at variable
rates, as depicted, for example, in Figure 3A (Esserman et al, 2009; Elmore and Fletcher,
2012). Some screen-detected cancers may progress so slowly, that they would never
have presented clinically; theoretically, some may be static or even regress but the
practical effect is the same. Detection of these cancers turns women into patients,
leads to surgery and other treatments that by definition are not beneficial for these
women and can cause harm, and adversely affects their quality of life.
As cancers are diagnosed earlier owing to screening, we expect cancer incidence to
be higher among screened than unscreened women during the screening period. However,
when screening ceases, the incidence should fall back (sometimes referred to as the
compensatory drop). If there is no overdiagnosis, the cumulative incidence in the
screened and unscreened women will equalise after screening ceases, after a period
equivalent to the lead time has elapsed. (Figure 3B, left). If there is overdiagnosis,
however, the cumulative incidence will remain higher in the screened group and not
equalise over time (Figure 3B, right).
Some overdiagnosis is seen as inevitable – some women will die before their screen-detected
cancer would have presented symptomatically. Establishing its frequency is critically
important in weighing up the benefits and harms of screening, both for populations
and individual women. A big challenge is to get unbiased estimates of the risk. Opinions
on the frequency of overdiagnosis range from it being trivial and unimportant to women
to being very important and swamping any benefit of screening.
Whether a particular woman has had an overdiagnosed cancer, or whether individual
tumours are overdiagnosed, cannot be judged. It is only possible to estimate frequency
of overdiagnosis. The issue for the UK screening programmes is the magnitude of overdiagnosis
in women who have been in a screening programme from age 50–70, then followed for
the rest of their lives. There are no data to answer this question. Any estimate will
therefore be, at best, provisional.
4.2 Sources of data on overdiagnosis
Overdiagnosis can be estimated from RCTs or observational studies. Valid estimates
depend on similar underlying risks of breast cancer in the screened and unscreened
women, and that the effect of lead time has been accounted for (Puliti et al, 2012).
Overdiagnosed cancers are not all those detected earlier by screening but the subset
that would not otherwise have been detected at all.
Randomised controlled trials have the advantage that by design they compare groups
of women with the same average prognosis. There are disadvantages of the available
RCTs though, including a screening phase that was always shorter than that employed
in the NHS national screening programmes, and which varies across the RCTs.
The most reliable estimates of overdiagnosis are from those RCTs in which there was
no screening of the control group at the end of the screening period. As screening
advances detection of breast cancer, follow-up should extend beyond the screening
period to allow a catch up of diagnoses in the unscreened group. In essence, this
extended follow-up is needed to distinguish earlier diagnosis from overdiagnosis.
If allowance is not made for such catch up, the extra cancers diagnosed in the screened
group include some that would also have emerged without screening, albeit later. In
principle, the extended period of follow-up should correspond to the lead time, but
the average lead time is also the subject of debate, and the lead time is not the
same for all cancers. As follow-up is extended well beyond the screening period, new
cancers in both the screened and unscreened groups will be included regardless of
screening, and the ratio of total numbers of diagnosed cancers will converge towards
one (Puliti et al, 2012). An ideal follow-up would be to the end of women's lives.
However, pragmatically an adequate follow-up is perhaps 5–10 years after the end of
the intervention period (Biesheuvel et al, 2007; Puliti et al, 2011). The trials that
clearly did not invite the control group for screening at the end of the screening
phase were the two Canadian trials and the Malmö I trial for women aged 55–69 years
(Miller et al, 2000, 2002; Zackrisson et al, 2006).
In the other RCTs, all the women in the control group were offered screening at the
end of the active period of the trial. Estimates of overdiagnosis from these trials
are problematic. Screening of women in the control group might itself be expected
to lead to some overdiagnosis, and thus to an overall underestimate of overdiagnosis.
Exclusion of cancers diagnosed at the end-of-trial screening of the control group
would overestimate overdiagnosis, as the control women have not been followed long
enough.
Besides the RCTs, there are many non-randomised (observational) studies that have
attempted to estimate overdiagnosis. These studies raise many concerns, according
to the study design, with the key concern being the likely non-comparability of groups,
for example, in different geographical areas. As one contributor to overdiagnosis
is the development of other diseases leading to death, the risk of overdiagnosis might
be age-dependent. Estimates of overdiagnosis may thus be affected by the age distribution
of the screened group. For non-RCTs it is especially important that age distributions
are comparable.
4.3 Estimating overdiagnosis
Overdiagnosis can be estimated by comparing the incidence of breast cancer in cohorts
of screened and unscreened women who were followed for several years. Unfortunately,
although there is agreement on the concept of overdiagnosis, there has been a wide
divergence of views on how to estimate the amount of overdiagnosis, with the result
that estimates of the frequency of overdiagnosis vary widely, from ∼ 0–50%.
The estimated amount of overdiagnosis depends greatly on the way the calculation is
made, and many different methods exist. De Gelder et al (2011) (Appendix 4) described
seven approaches, all of which have been applied in recent publications. The differences
relate to which cases are included in the numerator and, especially, on the choice
of denominator. The rate of overdiagnosis can be considered in relation to women invited
to be screened, women actually screened, or cancers actually detected by screening.
It can also relate to lifetime or the screening age range. It can be expressed as
a percentage of the cancers diagnosed in the screening group or as the percentage
excess over that seen in the unscreened group. Also, it can be expressed as a relative
increase or an absolute increase. Clearly, the different estimates address different
questions. Understanding published estimates of overdiagnosis percentages requires
identification of exactly how those estimates were derived.
The panel believes that there is no single best way to estimate overdiagnosis. For
RCTs, the main options are:
From the population perspective, the proportion of all cancers diagnosed during the
screening period and for the rest of the woman's lifetime in women invited to screening
who are overdiagnosed (not including any diagnosed before the age of screening). This
probability can be estimated using the difference in cumulative numbers of newly diagnosed
breast cancers in groups invited or not invited to be screened, expressed either as
a percentage of the number of cancers in the control group (excess risk) or as a percentage
of the number of cancers in the screening group (proportional risk). This probability
will diminish over time as the number of newly diagnosed cancers increases in both
groups.
From the perspective of a woman invited to be screened, the probability that a cancer
diagnosed during the screening period represents overdiagnosis (Welch et al, 2006;
Harris et al, 2011). This probability can be estimated using the difference in cumulative
numbers of newly diagnosed breast cancers in groups invited or not invited to be screened,
expressed as a percentage of the cancers diagnosed during the screening phase of the
trial for women in the invited group. The cases in the invited group can also be restricted
to those actually detected at a screening visit – that is, excluding interval cancers
or cancers among women who did not attend for screening.
These approaches use the same numerator but varying denominators. The panel considers
that the appropriate calculations should include DCIS cases, but notes that some studies
have reported estimates of overdiagnosis in relation to invasive cancers only.
The panel illustrates how different approaches yield various estimates using data
from the Malmö trial (Andersson et al, 1988; Zackrisson et al, 2006), partly following
Welch (Welch et al, 2006; Welch and Black, 2010). All cancers, both invasive and non-invasive
DCIS, are considered. Also, for transparency, the calculations are expressed in terms
of numbers of women whereas some authors have reported rates per 1000 woman years
of follow-up.
The Malmö I trial included women aged 45–69 at entry. Cancer incidence was reported
after an average of 15 years of follow-up (to December 2001) (Zackrisson et al, 2006).
In the active screening period up to 1990, there were 741 cancers diagnosed detected
in the screening group and 591 in the control group, an excess of 150. In the period
from 1990 to 2001, a further 579 and 614 new cancers were diagnosed, respectively,
showing a catching up of 35 cancers. The total numbers of cancers in the screened
and control groups were 1320 and 1205, respectively, showing an overall excess of
115 cancers diagnosed among screened women. Zackrisson et al (2006) reported a RR
of 1.10 and interpreted these data as showing an estimated overdiagnosis of 10% (95%
CI 1–18%). Reporting such a percentage requires consideration of the denominator:
10% of what (Fletcher, 2011)? In fact, the figure of 10% represents the estimated
excess risk of a diagnosis of breast cancer among women who had been invited to be
screened, and were followed for 15 years after the trial ended. The figure of 10%
thus addresses the first key question stated above – population impact.
The panel calculated four estimates of percentage overdiagnosis from the Malmö I trial
(Table 2A). The younger women (age 45–54) were offered screening at the end of the
study period so the estimates are shown both for all women (age 45–69 at enrolment)
and only for women aged 55–69. Different definitions of overdiagnosis lead to estimates
ranging from 9 to 29%, although they are based on the same trial.
To answer the second key question – from the perspective of a woman being screened,
what is the probability that a cancer diagnosed during the screening period represents
overdiagnosis – it is important to include screen-detected cancers and interval cancers.
Among women being screened, whether in a trial or a routine screening programme, not
all of the diagnosed cancers will be detected at the routine screening; many cancers
will be picked up between screens, as ‘interval' cancers and might have presented
symptomatically in the absence of screening. The relative proportion of interval to
screen-detected cancers will increase as the screening interval increases (Breast
Screening Frequency Trial Group, 2002) – in general more screen-detected cancers implies
fewer interval cancers – so excluding interval cancers will give an estimate of overdiagnosis
subject to screening frequency. Further, clinical experience suggests that suspicion
of cancer may encourage a woman to accept the invitation to screen. The panel therefore
prefers to use, as a denominator for the risk of overdiagnosis among women invited
for screening, the second key question, the number of cancers diagnosed in invited
women throughout the period of screening.
4.4 Estimates of overdiagnosis
The literature on overdiagnosis has been reviewed by several authors since 2005. They
used different study inclusion criteria, but gave most attention to data from RCTs.
Moss (2005) calculated overdiagnosis for eight RCTs as did Gøtzsche (2004) for six
of the same trials. Biesheuvel et al (2007) reviewed the literature with particular
attention given to the RCTs and the two former reviews. Recently, Puliti et al (2012)
reviewed the European literature covering observational studies. Biesheuvel and Puliti
both considered the issue of bias in each of the studies, specifically in relation
to adjustment for lead time and case-mix.
Moss (2005) and Gøtzsche (2004) produced very different estimates of overdiagnosis
from the same trials. Biesheuvel et al (2007) converted all their estimates to a common
measure of overdiagnosis (method A described below), but important discrepancies remained.
Biesheuvel et al (2007) reported that in the studies they considered least biased,
overdiagnosis estimates ranged from −4 to 7.1% for women aged 40–49 years, 1.7 to
54% for women aged 50–59 years, and 7 to 21% for women aged 60–69 years (Biesheuvel
et al, 2007). Similar large variations have been seen in the estimates of overdiagnosis
from observational studies (Puliti et al, 2012). Some of the variation seen in these
age-specific estimates stems from very small numbers of cases within age groups within
trials.
Given the wide variation in both the methods used and the estimates obtained, the
panel calculated four estimates of percentage overdiagnosis:
A. Excess cancers as a proportion of cancers diagnosed over whole follow-up period
in unscreened women
B. Excess cancers as a proportion of cancers diagnosed over whole follow-up period
in women invited for screening
C. Excess cancers as a proportion of cancers diagnosed during screening period in
women invited for screening
D. Excess cancers as a proportion of cancers detected at screening in women invited
for screening
RCTs without screening of control group at the end of the trial
The most reliable estimates of overdiagnosis come from RCTs in which women in the
control group were not offered screening at the end of the trial. Three trials clearly
meet this criterion: Malmö I, for women aged 55–69 years, and the two Canadian trials
that screened women for 5 years and reported follow-up data at 11 years (i.e., about
6 years after the end of screening; Miller et al, 2000, 2002). The estimates of overdiagnosis
from these two trials were quite similar to those from Malmö I.
The situation with the HIP study was less clear from the available literature, so
the panel excluded this study for the purposes of the estimate of overdiagnosis. In
addition, the panel had difficulty from the published literature extracting the data
on the numbers of cancer cases in the two arms using the same definition of cases
as the other three studies. In particular, the first report of the HIP study included
both DCIS and lobular cancer in situ (LCIS) in the non-invasive cases (Shapiro, 1977;
Shapiro et al, 1982), but thereafter we could not determine whether LCIS cases had
been included in the subsequent incidence data, nor whether non-invasive cases had
been included in the process of cross-checking with the New York Cancer registry data
and National Death index (Chu et al, 1988). Estimates of overdiagnosis from the Malmö
I and the two Canadian trials using the four methods already described are shown in
Table 2B. The estimates from the three RCTs are quite similar.
Opportunistic screening in the control group would lead to an underestimate of overdiagnosis.
In the Malmö and Canadian trials, about 25% (26% and 17%, respectively, in the two
Canadian trials) of the women in the control group reported having received a mammogram
both during the active trial period and follow-up period. No allowance has been made
in the above calculations for that effect.
All four methods use the same numerator, derived from the difference in newly diagnosed
cases of breast cancer in the group invited for screening and the control group. Methods
A and B differ in whether they compare the excess against the number of cancers diagnosed
in the control group or the screening group. Many published estimates use the former
(method A).
None of the methods are wrong – they just address different questions. The panel's
preferred measures are method B to address the population perspective and method C
for the perspective of an individual woman. Figure 3C shows the results from random
effects meta-analyses for these two estimates of overdiagnosis.
As many have noted, these three RCTs offer the most reliable evidence for an estimate
of overdiagnosis. The combined data suggest a risk of overdiagnosis of about 11% with
a population perspective and 19% from the individual woman's perspective.
The panel considers the data consistent with overdiagnosis of about 5–15% from the
population perspective and 15–25% from the individual woman's perspective. These estimates
are subject to the same sources of uncertainty as noted for the estimates of mortality
from the RCTs. In addition, the estimates are not tailored to the UK screening scheme
or a 20-year screening period.
In total, these three trials included only 1200 cancers diagnosed during the screening
period of which an estimated 243 were overdiagnosed. Given these small numbers, it
is important to consider other estimates from other RCTs and the higher quality observational
studies. However, those studies clearly provide less reliable estimates.
RCTs with screening of control group at the end of the trial
In several RCTs, all the women in the control group were offered screening at the
end of the active phase of the trial. Estimates of overdiagnosis from these trials
are problematic. Exclusion of cancers detected at the end-of-trial screening of the
control group would overestimate overdiagnosis, as the control women have not been
followed long enough. Such an effect is clearly seen in the RCTs without end-of-trial
screening. On the other hand, inclusion of cases detected at the end-of-trial screen
of women in the control group means that screening is not being compared with no screening.
Also, some of the cancers detected by that screen would themselves be overdiagnosed.
Thus, including these cancers would lead to an underestimate of overdiagnosis.
Although for several trials both calculations just described are possible, the estimates
obtained generally vary widely. For example, for the Stockholm trial using method
B, the estimate of overdiagnosis varies from −2.6% from all diagnosed cancers to +39%
if cancers detected at the end-of-trial screen of the control group are excluded.
Although it is reasonable to believe that these two estimates bracket the desired
answer (had there been no extra screen and with extended follow-up), the panel believes
it is impossible to get useful and reliable estimates of overdiagnosis from these
trials. An alternative approach is to estimate the effect of lead time and adjust
for it. That approach makes very strong, unverifiable assumptions, and the panel is
not persuaded that such an adjustment can be made reliably.
Observational studies
Overdiagnosis can be estimated from some non-RCTs, but as always with observational
studies there are serious concerns about comparability. Numerous observational studies
have adopted a variety of study designs to compare screened and unscreened women or,
more often, women who were or were not invited to screening.
There is a considerable body of literature examining the effects of screening in populations
and trying to assess the degree of overdiagnosis. Even in the absence of screening,
breast cancer incidence rates are not stable over time in populations, and the wide
variation in quoted overdiagnosis rates reflects this variation as well as different
lengths of follow-up, different statistical assumptions, and different ways of accounting
for lead time.
When screening is introduced there will be a short-term rise in the incidence of newly
diagnosed cancers. If that rise is solely due to advancing the time when some cancers
are diagnosed the increase should fall back to pre-screening levels after some years.
A failure to do so may be interpreted as evidence of a degree of overdiagnosis (Esserman
et al, 2009). Time trends can also be examined for women of different age groups:
before, during, and after the screening programme age range. Such data are shown in
Figure 1A in section 2 for breast cancer incidence in the United Kingdom. The increase
in incidence associated with the introduction of population screening is clearly seen,
first for women aged 50–64 and later for women aged 65–69.
Some studies have compared post-screening incidence with a projection of previous
incidence trends in the screened population. Those studies have resulted in very different
estimates of overdiagnosis. The panel asked Cancer Research UK to review a set of
plausible assumptions made in the literature and to produce estimates based on these
assumptions (Jørgensen and Gøtzsche, 2009a; Duffy et al, 2010). The panel found that
by changing each of the assumptions, one could get a vast range of estimates of overdiagnosis
(Appendix 6). The results of the modelling produced a range of estimates for the impact
of the current NHS breast screening programme in England from 0 to >6550 women (aged
⩾45) per year in England. Ten per cent of the results were <1150 and ten per cent
>4115. As there appears to be no a priori reason to favour one set of assumptions
over another, the panel do not think that approaches based on extrapolation offer
a robust method to estimate overdiagnosis.
Several groups have compared breast cancer incidence trends over time in screened
and unscreened countries or regions over the same time period (Jørgensen and Gøtzsche,
2009). The difficulty with these studies is distinguishing true overdiagnosis from
the excess incidence of breast cancer that results from screening, bringing forward
the time of diagnosis. Given that overdiagnosis is defined as a cancer that would
not have come to attention in the woman's life span, long follow-up after cessation
of screening is essential. The difficulties can be illustrated by studies of comparisons
of incidence rates in regions within a single country that did or did not introduce
population screening. A study from Denmark is illustrative, as only 20% of the Danish
population was offered organised mammography screening over a long time-period (Jørgensen
et al, 2009). Screening was introduced in Copenhagen in 1991 and in Funen in 1993
for women aged 50–69. The authors noted that the population in those areas has distributions
of age and socioeconomic status comparable with the rest of Denmark.
Table 2C shows the numbers of breast cancers diagnosed per 100 000 women in screened
and non-screened areas of Denmark for 20 years before and 13 years after the introduction
of screening in 1991. Incidence rates of breast cancer were higher in the screened
areas than in the non-screened areas before screening began, suggesting some non-comparability
of the areas. During the 13 years of screening, the incidence in women aged 50–69
rose both in the screened areas and the non-screened areas, but more in the screened
areas. Incidence also rose in women aged 70–79. One way to estimate overdiagnosis
is to compare the ratio of new cancers in screened and unscreened groups in the two
periods. In the pre-screening period, the ratio was 1.08 (214/198) and for the screening
period it was 1.35 (386/286). The authors say that these data indicate 35% overdiagnosis,
but if we adjust for the pre-screening difference the excess is 25% (1.35/1.08=1.25).
These simple calculations ignore the underlying rise in cancer incidence throughout
the period. The authors used regression modelling to take account of incidence trends
and age differences, giving an estimate of 33%. As noted earlier, such analyses make
additional assumptions that are not verifiable. Studies such as this do not indicate
the likely effect of long-term follow-up in reducing the excess in the incidence rate
in the screened compared with the unscreened populations.
There have been many other observational studies, but most have the type of problem
illustrated here in distinguishing overdiagnosis from the expected increase in breast
cancer incidence due to screening and require many assumptions to derive estimates
of overdiagnosis. A recent review of 13 observational studies showed overdiagnosis
to vary in the range of 0–54%. Adjustment for lead time and breast cancer risk yielded
overdiagnosis estimates in the range of 1–10% (Puliti et al, 2012).
The panel's judgement is that the best estimates will come from long-term follow-up
of RCTs, as reviewed above.
Statistical and other uncertainties
As noted in section 3, it is conventional that results from statistical analyses,
including meta-analyses, are presented with a measure of statistical uncertainty such
as 95% confidence limits. Although these are helpful in giving an impression of the
possible influence of the play of chance (given the sample sizes that are available
in the studies considered), they fail to represent the uncertainties due to possible
biases (internal validity of the studies) or to generalisation from the studies to
a new context (external validity). So the CIs given for the estimated percentage overdiagnosis
are an understatement of the uncertainty about the risk of overdiagnosis associated
with the UK screening programmes. Estimates of overdiagnosis have additional uncertainties
relating to which estimate to use, and the data are not available for all studies
to calculate overdiagnosis in the suggested ways.
Conclusion
The panel believes that overdiagnosis occurs, and that women need to be aware that
screening carries a risk of detecting cancers, invasive and in situ, which would not
have troubled them in their lifetime. Tumours that represent overdiagnosis cannot
be identified clinically and so will have to be managed according to current clinical
protocols.
The panel considers that the data from three of the RCTs without end-of-trial screening
of controls provide the most reliable estimates of the extent of overdiagnosis, but
notes that there is a rather limited amount of data and numerical estimates are subject
to several uncertainties in common with estimates of mortality benefit.
As noted for the estimated benefit for mortality (see section 3.2), the overdiagnosis
rates estimated from old RCTs may not reflect those in current screening programmes.
There is, however, no clear evidence to suggest that the current rate of overdiagnosis
would be lower or higher than in the original trials. The panel thinks that the best
estimate of overdiagnosis for a population invited to be screened is of the order
of 11%, defined as the percentage excess incidence in the screening population above
the long-term expected incidence in the absence of screening.
An alternative definition addresses the answer to the question ‘if I am invited to
enter into the screening programme and am given a cancer diagnosis during the screening
period, what is the likelihood of overdiagnosis'? The panel views the evidence as
suggesting that this probability is of the order of 19%.
4.5 Consequences of overdiagnosis
As previously stated, detection of overdiagnosed cancers turns women into patients,
leads to surgery and other treatments that are not therapeutically beneficial for
these women and can cause harm, and adversely affects their quality of life. As cancers
that would not go on to cause cancer death cannot be individually identified, they
are treated according to the current treatment protocols. Figure 3D summarises the
management of UK screen-detected cancers, both invasive and non-invasive, in 2010/2011
(NHS Breast Screening Programme & Association of Breast Surgery-West Midlands Cancer
Intelligence Unit, 2012).
One cannot, however, assume that the overdiagnosed cancers would be managed in the
same proportional way as the generality of screen-detected cancers. That the patient
dies before the cancer would have presented clinically, implies that such tumours:
would tend to be more slowly growing, as a more rapidly growing tumour would be more
likely to present clinically within a shorter time-frame;
would be relatively small, as larger tumours would be more likely to present symptomatically.
Thus, overdiagnosed cancers would tend to be more likely to be:
DCIS (and the relative excess of DCIS in screen-detected cancers would support this),
and possibly more likely to be low/intermediate rather than high grade.
Grade 1 or grade 2 invasive rather than grade 3.
Thus, compared with the diagram, patients with cancers that are overdiagnosed would
be:
relatively more likely to have been treated on the DCIS side than the invasive; and
as more likely to be low/intermediate grade, less likely to have had radiotherapy;
if invasive, more likely to be managed by WLE and radiotherapy than mastectomy as
likely to be small
if an invasive cancer, less likely to have had chemotherapy, as patients having chemotherapy
are more likely to have had grade 3 and/or node-positive cancers (NHS Breast Screening
Programme & Association of Breast Surgery-West Midlands Cancer Intelligence Unit,
2012);
if an invasive cancer, more likely to have had endocrine therapy, as oestrogen positivity
is associated with older age and lower grade invasive cancers.
Evidence in support of this tendency for overdiagnosed cancers to be of potentially
better prognosis, and thus given less aggressive therapy can be seen, for example,
in the reports of the nature of cancers found in the two arms of randomised screening
trials. Table 2D shows such data for the Malmö I trial.
4.6 Ductal carcinoma in situ (DCIS)
There is evidence that breast screening has led to an increase in the identification
of DCIS (IARC, 2002). It has been suggested that DCIS is a relatively benign condition
that would not cause harm, and therefore diagnosis of DCIS contributes significantly
to the magnitude of overdiagnosis.
Definition
DCIS is a malignant process that arises from the epithelial tissues of the breast,
and consists of neoplastic cells, which do not, however, infiltrate beyond the limiting
basement membrane, and thus remain within the ducts where they arose. Classification
is based on the morphological features: architectural growth pattern and the cytological
characteristics of the malignant cells. It is usually grouped by grade into high,
intermediate, or low grade (IARC, 2002). Along with LCIS, it is classified as non-invasive
breast cancer, and although the cells have the appearance of malignancy, they do not
show invasiveness, so carcinoma in situ is not in itself a life threatening condition.
The concern is that at least some have the capacity to progress to invasive malignancy.
DCIS is most commonly detected mammographically as microcalcification. Less commonly,
DCIS will present with a symptomatic lump.
Incidence
Table 2E adapted from ‘The non-invasive breast cancer report' (National Cancer Intelligence
Network, 2011), shows the frequency of non-invasive breast cancer for different age
groups and presentations in England for the two years 2006 and 2007.
The majority (about 90%) of non-invasive cancers diagnosed are DCIS. It is apparent
that the majority are screen-detected but, nevertheless, 38% were diagnosed symptomatically.
Some of the symptomatic tumours may have been detected incidentally when patients
presented with a different problem (e.g. microcalcifications found in the contralateral
breast when the woman has presented with a benign problem in the one breast). Thus,
the detection and management of non-invasive disease is not exclusively a problem
of the screening programme. Nevertheless, within the screening age group (age 50–70),
the majority (79%) of the DCIS is screen-detected. For 2009–2010, of all screen-detected
cancers, about one in five were non-invasive, being a little higher (24%) for the
prevalent round and lower (19%) for the incident rounds (The NHS Information Centre).
Thus, a mammographic screening programme will detect DCIS. In some cases, (about one
in five) (Evans, 2012) investigation of what is radiologically DCIS will lead to the
detection of an invasive carcinoma – the larger the area of DCIS, the more likely
that there will be a frankly invasive component.
Natural history of DCIS
Before introduction of the screening programme, DCIS was a relatively uncommon tumour.
Since it is frequently a marker of associated invasive cancer, it has been investigated
and usually excised, and hence it is not possible to know what would have happened
if it had been left undisturbed and untreated. Given that the screening programme
is diagnosing much more DCIS than presents symptomatically, the relevant questions
are:
How common is DCIS?
As above, it represents about 1 in 5 of screen-detected cancers, but only 1 in 20
of all symptomatic cases (National Cancer Intelligence Network, 2011). In reports
of small series (IARC, 2002) of women without known breast cancer who underwent postmortems
(hospital-based or forensic), invasive cancer was found in about 1% and DCIS in 9%,
but there was wide variation in the series, presumably reflecting differences in the
women selected and methodologies for examining the breast.
How often does it progress to invasive cancer?
The data from trials of therapy (radiotherapy and/or tamoxifen) after WLE of DCIS
shows that both interventions reduce the risk of local relapse (similar to the findings
for invasive cancer after WLE). Relevant to the UK screening programme is the UK,
Australia, New Zealand (UK/ANZ) trial (Cuzick et al, 2011), in which after WLE of
screen-detected DCIS, without any further treatment, relapse in the breast occurred
in about 19% of cases, in half of which the relapse was invasive. Progression appears
to occur slowly – for example, one series of screen-detected DCIS (Wallis et al, 2012)
showed the median time to invasive progression for high-grade DCIS was 76 months,
and for low/intermediate grade 131 months.
Is there any way of identifying those cases of DCIS that will or will not progress/relapse
as invasive cancer?
DCIS is classified histologically on the basis of excised specimens, and there is
currently no certain means of identifying lesions that would not progress. The risk
of invasive relapse is higher with high- or intermediate-grade DCIS. Low-grade DCIS
seems to pursue a more indolent course, and when invasive relapse occurs it is likely
to be a grade-1 tumour. There is ongoing work (Pinder et al, 2010; Reeves et al, 2012)
looking at histological and molecular markers to identify those most likely to progress,
especially to invasive disease.
Does DCIS affect survival?
The follow-up of patients with DCIS usually shows excellent survival. For example,
in the UK/ANZ trial of 1701 women with a median follow-up of 12.7 years, only 179
(11%) had died, of which 39 (2% of all cases) died of breast cancer. Long-term follow-up
(NHS Breast Screening Programme & Association of Breast Surgery-West Midlands Cancer
Intelligence Unit, 2012) of 1603 cases of screen-detected non-invasive breast cancer
(nearly all DCIS) showed a 20-year relative survival of 97.2% (95% CI 93.6–100.6),
with 7.2% of the 493 deaths being due to breast cancer. However, these series are
of patients who have had the DCIS treated: what is unclear is what the risk of dying
of breast cancer would have been had it been left untreated.
Conclusions
The main question is whether DCIS is a marker of malignancy requiring active treatment
or a benign condition of no clinical significance. On the one hand, DCIS (particularly
high grade) can certainly serve as a marker for invasive cancer – either because it
is associated with the presence of invasive disease at the time of detection, or because
its presence indicates an increased risk of invasive disease developing subsequently
– in about 10% of cases at 10 years after WLE only. On the other hand, autopsy series
and screening programmes both demonstrate that DCIS can be found in the breast of
middle-aged women at a greater frequency than presents symptomatically.
Part of the explanation is time. Breast cancer has a long natural history and in patients
with invasive cancer, the evolution of metastatic spread and ultimate death may take
place over decades. If one also considers the progression of DCIS to invasive cancer
as part of this process, the evolution is even longer. In other words, the relevant
question is not whether DCIS progresses to invasive cancer (it can), but whether it
might have progressed to an invasive cancer that causes symptoms within the lifetime
of the women concerned. This will depend mainly on the age of the woman, her life
expectancy at the point of diagnosis, and perhaps other factors that could affect
progression (hormonal exposure, obesity, etc.). Current series do not show a significant
impact of DCIS on survival, after treatment, even at 20 years, but increasing survival
may mean that for women in their 50s and even 60s, the diagnosis of DCIS may impact
on their long-term survival. Long-term data are needed.
Thus, in diagnosing DCIS via a screening programme, there is a balance to be struck
between the potential benefits for some women of identifying and treating a pre-invasive
cancer, and the risks for others of treating something that would never have affected
the woman in her lifetime. It is not simply the case that DCIS represents overdiagnosis,
although it undoubtedly is a contribution to the cases of overdiagnosis.
5. Other considerations
5.1 Introduction
Beside the benefit of breast screening for mortality and its harm in terms of overdiagnosis,
the panel considered other relevant issues. These include additional harms through
invitation, screening, diagnosis, and treatment, as well as women's perceptions and
cost effectiveness. Although the panel has not made a systematic appraisal of evidence
in all these areas, being outside its terms of reference (Appendix 1), it has drawn
together comments on each of these issues as they should not be neglected when considering
the overall impacts of breast screening.
5.2 Harms associated with breast screening
Mammography
Radiation exposure
Mammography uses X-rays and thus exposes women to very low doses of ionising radiation
that could cause breast cancers. The actual dose of radiation depends on several factors
including the number of views of each breast and whether film or digital mammography
is used.
The Health Protection Agency (Health Protection Agency 2001) has suggested that the
lifetime additional cancer risk for each mammography examination is between 1 in 1 00 000
and 1 in 10 000.
Although these doses are lower than those for which cancer is directly induced (Preston
et al, 2002), screening a large population on a regular basis may cause harm. The
NHS Breast Screening Programme (2011) in 2006 stated that for every 14 000 women in
the age range 50–70 years screened by the NHSBSP three times over a 10-year period,
the associated exposure to X-rays will induce about one potentially fatal breast cancer.
(NHS Breast Screening Programme & Association of Breast Surgery-West Midlands Cancer
Intelligence Unit, 2012). A more recent estimate is that screening women every 3 years
from age 47–73 would cause 3–6 cancers per 10 000 women screened (Berrington de Gonzalez,
2011). This risk is incorporated in estimates of the benefit of screening (see section
3). Digital mammography, which uses a lower radiation dose, is increasingly being
used in the English screening programme. Therefore, it is likely that the risk of
exposure will be reduced.
Pain
During the process of mammography, the breast is compressed and flattened in order
to create a uniform density, which improves the image and reduces the radiation dose.
A substantial proportion of women find this painful and some studies (Nelson et al,
2009; Gøtzsche and Nielsen, 2011) have shown that the pain and discomfort of mammography
deters them from attending for further screening (Gøtzsche and Nielsen, 2011).
The assessment process
Figure 4 summarises the process and numbers for women recalled after routine screening
mammograms Figure 4.
Many women take part in the screening programme; it is often argued that for many
the benefit will be reassurance (Welch et al, 2011). With that reassurance, however,
must come the knowledge that all screening tests have errors of false positives and
false negatives. The mammogram may sometimes appear to show an abnormality that requires
further investigation to determine whether or not it is a cancer-requiring treatment
or fail to detect a cancer that is present.
False-positive mammogram
In Figure 4 2522 women (i.e., 3105 recalled minus the 583 diagnosed with cancer=2522:
3.36% of all the women screened) were recalled and found not to have cancer. This
is called a false-positive result. Of the women recalled and found not to have cancer,
the majority (1744/2522=69%) had only further imaging (mammography, ultrasound) but
a minority (778/2522=31%) had a biopsy, which was core biopsy under local anaesthetic
in all except 2.3% (57/2522) who had a formal biopsy under general anaesthetic. The
latter group represents only 0.076% (57/75 057) of all women screened.
Numerous studies have assessed the psychological impact of a false-positive result
on women (Brett et al, 1998; Brett and Austoker, 2001). The studies' results are conflicting
but a recent systematic review of the literature (Bond et al, 2012) concluded that,
in the population at general risk of breast cancer, a false-positive result can cause
breast cancer-specific psychological distress, which may endure for up to 3 years.
The degree of distress is associated with the level of invasiveness of subsequent
assessment. Some studies found that the distress caused by a false-positive result
deterred some women from re-attending for breast screening, which would reduce any
benefit they would otherwise have got from being offered screening in the first place.
The level of distress can be mitigated by providing women with clearly worded information
about the recall and appropriate support from clinical staff in before and during
assessment (Bond et al, 2012).
False-negative results
No screening test is completely accurate and sometimes mammography will not detect
a cancer. This may because the cancer is not mammographically visible or develops
between screening rounds and women are warned of this possibility in the screening
literature. When women present with an interval cancer, the previous mammograms are
reviewed blind to assess whether a suspicious abnormality was visible on the previous
screening mammogram. If so, such cases are classified as a true false-negative mammogram,
that is, the suspicious abnormality was missed at the first screen. For women attending
at three yearly intervals, the false-negative rate is 0.2/1000 women screened (Lawrence,
2012; c.f. the cancer detection rate by screening of 7.8 cancers/1000 women screened).
Diagnostic testing
Core biopsy carries a risk of local haemorrhage and, rarely, reaction to local anaesthetics.
Open surgical biopsy involves a general anaesthetic but it is regarded as a low-risk
procedure.
Psychological consequences of a positive diagnosis
The psychological consequences of a breast cancer diagnosis and subsequent treatment
have been well documented. In terms of harms of screening, these consequences are
particularly relevant to those women who have been overdiagnosed. Although these women
will not know that the cancer would not have caused them any harm they will have suffered
unnecessary psychological trauma associated with a cancer diagnosis
Two studies (Yousaf et al, 2005; Schairer et al, 2006) have shown a small but significant
increased risk of suicide in patients diagnosed with breast cancer. The risk increases
with advancing stage of the disease and therefore may be less relevant for those who
are overdiagnosed. However, two further studies (Jamison et al, 1978; de Leo et al,
1991) have found suicidal ideation to be present in some patients post-mastectomy.
Although these risks are small they should not be overlooked when assessing the benefits
and harms of breast screening.
Potential morbidity and mortality from treatment
Breast surgery
As with any surgical procedure, there are hazards from the anaesthetic and the surgical
procedure itself. Although the surgery can be extensive (especially if it involves
reconstructive surgery as well), the surgery is elective, patients are assessed pre-operatively,
serious complications are rare. The most extensive surgery is mastectomy and reconstruction
for which the mortality is estimated to be <0.3% (The NHS Information Centre). In
contrast, following mastectomy, 10% of patients will have some sort of complication
(e.g. infection, fluid accumulation) (The NHS Information Centre).
Radiotherapy
Acutely, radiotherapy can cause skin reactions and uncommonly radiation pneumonitis.
Both of these are short-lived and usually not severe.
Radiotherapy can cause other long-term harms (Early Breast Cancer Trialists' Collaborative
Group (EBCTCG), 2005). There is, at 15 years, a small excess risk of non-breast cancer
mortality (15.9 vs 14.6%, an absolute difference of 1.3%). This is mainly due to heart
disease (so seen more in left- than right-sided cases because more of the heart is
irradiated), lung, and oesophageal cancers. These estimates are derived from trials
of radiotherapy performed mostly during or before the 1970s; since then radiotherapy
techniques have changed especially with the introduction of CT planning, so reducing
the volume of heart and lung irradiated, which should reduce, but not eliminate, such
complications. Data from the Surveillance Epidemiology and End Results (SEER) database
(Giordano et al, 2005) shows that the risk of death from ischaemic heart disease due
to radiotherapy has diminished from 1973 to 1989 (risk from right-sided tumours unchanged,
left sided decreased).
The last published Oxford overview (Clarke et al, 2005) showed that there is a reduction
in mortality from the reduction in local recurrence of invasive cancer by radiotherapy.
Essentially, for every four recurrences prevented at 5 years, there will be one death
prevented at 15 years. For illustration, the local recurrence rate in the radiotherapy
START trial (in which many patients had screen-detected cancers) was 3.5% at 5 years,
which, given radiotherapy reduces local recurrence by about two-thirds, would correspond
to a 5-year local recurrence rate of about 10.5% without radiotherapy. This gain of
7% in local control should correspond to a reduction in mortality of just under 2%.
Adjuvant hormone therapy
The most extensive experience is with tamoxifen. Trials of adjuvant tamoxifen for
5 years have shown that for patients with hormone receptor-positive breast cancer,
breast cancer mortality is reduced by about 33% (Early Breast Cancer Trialists' Collaborative
Group (EBCTCG) 2005), translating into an absolute reduction in mortality at 10 years
of 5.3% and 12.2% for node-negative and node-positive patients, respectively. Tamoxifen
does have some long-term hazards in that it carries an increased risk of uterine cancer
and thromboembolic disease. Their effect on mortality is of the order of 0.2% per
decade and is outweighed by the modest but positive effect of tamoxifen on ischaemic
heart disease (possibly because it lowers cholesterol) (Dewar et al, 1992). Aromatase
inhibitors are increasingly used instead of tamoxifen, but their overall effect on
mortality is very similar to that of tamoxifen.
Cytotoxic chemotherapy
Adjuvant cytotoxic chemotherapy reduces both overall and breast cancer-specific mortality.
Use of an anthracycline- or taxane-containing regime yields a RR reduction of about
one third in breast cancer mortality (Peto et al, 2012). The absolute benefit depends
on the risk profile but will often be of the order of 6–7% at 10 years. There are
acute toxicities associated with giving chemotherapy — such as alopecia, nausea and
vomiting, which are all unpleasant but non-fatal. Acute neutropenic sepsis can be
fatal but this is a rare event in the adjuvant setting. There is an increased risk
of thromboembolism. Mortality rates during adjuvant chemotherapy have been reported
at around 0.3% (Cameron et al, 2003). The main long-term risks are (Azim et al, 2011):
Cardiac: Anthracyclines can cause a cardiomyopathy, the incidence being dose related
and increasing with age. Trials suggest an absolute excess mortality of up to 1%,
but this may be an underestimate as the incidence of cardiac failure may be higher
and can occur many years after treatment.
Second cancers: The main risk with chemotherapy, particularly anthracycline-based,
appears to be acute myeloid leukaemia and myelodysplastic syndrome. At standard doses,
the risk is probably of the order of 0.5% but may be higher if the doses (especially
of alkylating agents and anthracyclines) are increased.
Neurotoxicity and premature menopause: Both are very real causes of morbidity but
not of mortality.
Conclusion
We know that within the NHS screening programmes, of patients found to have invasive
or non-invasive cancer, 99% have surgery (of whom 5.7% have mastectomy and immediate
reconstruction), 72% have radiotherapy, 72% have adjuvant hormone therapy, and 27%
adjuvant chemotherapy (NHS Breast Screening Programme & Association of Breast Surgery-West
Midlands Cancer Intelligence Unit, 2012). From the above, assuming a worst case scenario,
it would be reasonable to assume no adverse mortality effect for hormone therapy,
no net effect of radiotherapy on mortality, a maximum of 0.2 per 1000 dying because
of surgery (0.3% of those having reconstruction) and 1.3 per 1000 dying because of
chemotherapy (0.5% of the 27% who have chemotherapy), giving an adverse mortality
rate of 0.15%. For patients who have an ‘overdiagnosed' cancer, the risk is likely
to be lower as it is unlikely that they would have received chemotherapy (see section
4).
The panel concludes that the excess mortality from the investigation and treatment
of invasive breast cancer is small and outweighed by the benefits of the treatment.
For DCIS, the benefits of radiotherapy or hormone therapy are in terms of recurrence
rather than a reduction in mortality, but the absolute risks of such treatment in
terms of mortality are likely to be very small. For patients with screen-detected
breast cancer, there is no evidence that these risks are any greater than in the symptomatic
population, but for women diagnosed with a breast cancer, that if it were certain
would never be symptomatic, there is nevertheless a real, but very small, mortality
risk from being screened.
5.3 Women's perceptions of screening
The development of new information to accompany cancer screening invitations was not
in scope for this review and is being dealt with separately. Women's perspectives
on overdiagnosis and whether they see it as a key issue in their screening decisions
had not previously been investigated, so Cancer Research UK commissioned some qualitative
research to investigate this. The findings, from one focus group attended by panel
members, are presented briefly here for information (Appendix 5), but academic papers,
focusing on a larger sample of qualitative research, will follow publication of this
report.
These women understood the concept of screening and most had attended. Although they
understood breast cancer, and many knew people who had had it, they had little concept
of DCIS and overdiagnosis. Their opinions are not mainly informed by the screening
leaflet, and it would appear many do not read it in detail. Thus, informing women
about screening will involve much more than simply re-writing the leaflet.
5.4 Cost-effectiveness of breast screening
It was not in the panel's remit to review the data relating to the costs or the cost-effectiveness
of breast cancer screening. The Department of Health in England has provided funds
of about £100 million per year to deliver the current screening programme (NHS Breast
Screening Programme, 2012).
If one were to take the well-founded cost-effectiveness approach such as that employed
by the National Institute for Health and Clinical Excellence (NICE) when reviewing
a health technology, it would be important to establish the costs not only of the
intervention, but of all subsequent interventions, both in those invited to be screened
and those not offered screening. No such data are available for any of the randomised
trials, and thus this panel is not in a position to consider the full costs of a breast
screening programme, including the financial costs to the NHS of any overdiagnosed
cancers.
Thus, although it has been estimated that the UK NHSBSP comes within the NICE cost/quality-adjusted
life year threshold of £20 000–30 000 (Advisory Committee on Breast Cancer Screening,
2006), the panel is not able to comment on this, as it has not been able to scrutinise
the costs of treatment with and without screening, including the costs of treating
the cancers that are overdiagnosed.
We can, however, make general comparisons with other interventions and see that, in
terms of lives saved per year, breast cancer screening is of a similar order of magnitude
as cervical screening, bowel cancer screening using faecal occult blood testing and,
the use of statins (Table 3).
6. Conclusions and recommendations
6.1 Recommendations for further research
The panel's review of the randomised trials of breast screening leads to the following
recommendations about future research priorities:
An individual participant data meta-analysis of the breast screening trials is in
progress. This should help resolve some (but not all) of the concerns that have been
raised about individual trials and their combined interpretation. The panel supports
this enterprise, and is disappointed that it had already not been done a long time
ago.
The impact of breast screening outside the ages 50–69 years is very uncertain. The
panel supports the principle of the ongoing trial in the United Kingdom for randomising
women under age 50 and above age 70 to be invited for breast screening.
The panel's review of overdiagnosis leads to their support for further research into
DCIS, in particular:
A proposed study to examine the need for treatment of low-grade DCIS
Continued support for the Sloane project, which has an extensive database of screen-detected
cases of DCIS, and the long-term follow-up of these cases may well improve our understanding
of this condition (The Sloane Project 2010).
Current mammographic screening techniques now detect many more cases of DCIS than
in the trials. The appropriate treatment of these is uncertain, because there is limited
information on their natural history (section 4.6). The panel supports studies to
elucidate the appropriate treatment of screen-detected DCIS.
Work on improved screening and pathological techniques that can predict prognosis
more effectively.
The panel also supports:
A re-evaluation of the cost-effectiveness of the NHS breast cancer screening programme
that takes into account the conclusion of this report.
6.2 Conclusions
Breast screening extends lives. The panel's review of the evidence on benefit – the
older RCTs, and those more recent observational studies judged to be relevant – point
to a 20% reduction in mortality in women invited to screening. A great deal of uncertainty
surrounds this estimate but it represents the panel's overview of the evidence. This
corresponds to one breast cancer death averted for every 235 women invited to screening,
and one death averted for every 180 women who attend screening.
The breast screening programmes in the United Kingdom, inviting women aged 50–70 every
3 years, probably prevent about 1300 breast cancer deaths a year, equivalent to about
22 000 years of life being saved; a most welcome benefit to women and to the public
health.
But there is a cost to women's well-being. In addition to extending lives by early
detection and treatment, mammographic screening detects cancers, proven to be cancers
by pathological testing, that would not have come to clinical attention in the woman's
life were it not for screening - called overdiagnosis. The consequence of overdiagnosis
is that women have their cancer treated by surgery, and in many cases radiotherapy
and medication, but neither the woman nor her doctor can know whether this particular
cancer would be one that would have become apparent without screening and could possibly
lead to death, or one that would have remained undetected for the rest of the woman's
life.
The answer the panel sought was to the question of the level of overdiagnosis in women
screened for 20 years and followed to the end of their lives. Estimates abound of
overdiagnosis, from near to zero to 50%, but there are no reliable data to answer
this question. There has not even been agreement on how to measure it. On the basis
of follow-up of three RCTs, the panel estimated that in women invited to screening,
about 11% of the cancers diagnosed in their lifetime constitute overdiagnosis, and
about 19% of the cancers diagnosed during the period that women are actually in the
screening programme. However, the panel emphasises, these figures are the best estimates
from a paucity of reliable data. Any excess mortality stemming from investigation
and treatment of breast cancer is considered by the panel to be minimal and considerably
outweighed by the benefits of treatment.
Putting together benefit and overdiagnosis from the above figures, the panel estimates
that for 10 000 UK women invited to screening from age 50 for 20 years, about 681
cancers will be found of which 129 will represent overdiagnosis, and 43 deaths from
breast cancer will be prevented. In round terms, therefore, for each breast cancer
death prevented about three overdiagnosed cases will be identified and treated. Of
the ∼307 000 women aged 50–52 who are invited to screening each year, just over 1%
would have an overdiagnosed cancer during the next 20 years. Given the uncertainties
around the estimates, the figures quoted give a spurious impression of accuracy.
6.3 Policy recommendations
The panel concludes that the UK breast screening programmes confer significant benefit
and should continue. The greater the proportion of women who accept the invitation
to be screened, the greater is the benefit to population health in terms of reduction
in mortality from breast cancer. However, for each woman the choice is clear: on the
plus side, screening confers reduction in the risk of mortality from breast cancer
because of early detection and treatment. On the negative side, is the knowledge that
she has perhaps a 1% chance of having a cancer diagnosed and treated that would never
have caused problems had she not been screened.
Evidence from a focus group the panel conducted, and in line with previous similar
studies, was that screening was an offer many women will feel is worth accepting:
the treatment of overdiagnosed cancer may cause suffering and anxiety but that suffering
is worth the gain from the potential reduction in breast cancer mortality. Clear communication
of these harms and benefits to women is of utmost importance and goes to the heart
of how a modern health system should function. There is a body of knowledge on how
women want information presented, and this should inform the design of information
to the public.