Mammography Screening—Benefits, Limitations, and Potential Harms
The randomized controlled trials
The efficacy of mammography screening has been well established through the accumulation of evidence from the RCTs and more recent evaluations of modern service screening. There have been eight population-based RCTs of breast cancer screening, and two RCTs that randomized volunteers (the Canadian National Breast Screening Studies [NBSS] 1 and 2). Among the RCTs, five were carried out in Sweden, three in North America, and two in the UK. The earliest RCT of breast cancer screening was the Health Insurance Plan of Greater New York Trial (HIP), which was initiated in 1963. Nearly 30 years later (1991), the United Kingdom (UK) launched the Age Trial, which was designed specifically to measure the benefit of mammography screening among women in their 40s, without any age migration past 50 years.
The RCT with the longest follow-up (29 years) is the Swedish Two-County Trial, which was the first trial to demonstrate a breast cancer mortality reduction associated with invitation to mammography screening without CBE. The results from the long-term follow-up are worth highlighting because they illustrate the importance of long-term follow-up of the RCTs to measure the full impact of screening. In the Two-County Trial, 133,065 women aged 40 to 74 and residing in two Swedish counties were randomized into a group invited to three to four rounds of mammographic screening over a 7-year period, and a control group receiving usual care. From the first evaluation 8 years after the study start throughout the 29 years of follow-up, there was a highly significant 31% fewer deaths in the group invited to screening than in the control group (relative risk [RR] = 0.69; 95% confidence interval [CI], 0.56–0.84; P < .0001). It is noteworthy that the statistically significant reduction in breast cancer mortality not only lasted over the duration of the follow-up period, but in addition, the absolute benefit of an invitation to screening improved over time and was still improving after 20 years. In fact, most of the breast cancer deaths prevented occurred 10 years or longer after the inception of screening. This observation illustrates the importance of very-long-term follow-up (ie, > 20 years) to approach measuring the full impact of mammography screening, in particular the absolute benefit. For example, the number of women needed to screen (NNS) every 2 to 3 years over a 7-year period to save one life was 922 at 10 years, 464 at 20 years, and 414 at 29 years. If screening had been carried out for 10 years and the same relative benefit had been achieved, the absolute benefit as measured by the NNS would have been greater, that is, an estimated 300 women needed to screen to save one life. The absolute benefit can be expressed in other ways as well: at 29 years of follow-up, one life was saved for every 1334 screening mammograms, or for every 1000 women from the ages of 40 to 69 screened every 2 years, between 8 and 11 breast cancer deaths would be prevented.
Figures 3a and 3b show the summary relative risk (RR) of breast cancer mortality in the groups invited to screening compared with the control groups for the 8 population-based RCTs (RR = 0.77; 95% CI, 0.73–0.86) and 10 population-based and non–population-based RCTs (RR = 0.79; 95% CI, 0.73–0.86), a combined estimate of 23% and 21% fewer deaths associated with an invitation to screening. Over the years, meta-analyses with various RCT inclusion criteria have been conducted to provide overall RRs and age-specific RRs, ranging from 14% to 23% fewer breast cancer deaths associated with an invitation to screening. As seen in Figure 3, however, five of the trials showed greater mortality reductions than the summary statistic.
The results of these meta-analyses should be regarded as conservative estimates of the effectiveness of mammography performed in the 1970s to1980s, for several reasons: First, not all of the RCTs were equally successful in reducing the risk of being diagnosed with an advanced breast cancer, which is the principal purpose of mammography screening (Table 2). RCTs that succeeded in significantly reducing the risk of being diagnosed with an advanced breast cancer in the group invited to screening also eventually demonstrated a similar, significant reduction in the risk of dying from breast cancer. Second, the RCT results are based on intention-to-treat analyses, that is, the summary RRs are based on the difference in the breast cancer death rate in the invited (attended and not attended combined) and control groups regardless of individual exposure to mammography screening. Obviously, there is a lower mortality rate among those who attended screening regularly than among all invited women. Close examination of the individual studies, rather than blurring those differences in a meta-analysis, provides a more informative and evidence-oriented approach to evaluating the results of the RCTs. Third, breast imaging technology has improved considerably over the nearly 50 years since the first breast cancer RCT was launched. These improvements fall into two groups: 1) tailoring the screening interval to a woman’s age, based on research results demonstrating the different tumor growth rates according to age and histologic tumor types; and 2) technical improvements that include developments in screen-film systems and FFDM systems, the contribution of quality assurance algorithms to image quality, the appreciation of the importance of two-view mammography, double reading, the use of computer-aided detection (CAD), and the emerging use of automated ultrasound screening or tomosynthesis as an adjunct to two-view mammography only in women with dense breasts. When working up the screening findings, the addition of ultrasound and breast MRI to diagnostic evaluation, as well as the use of interventional methods to establish microscopic diagnosis preoperatively, play an important role in arriving at the final diagnosis and initiating proper patient management. Several decades of experience and significantly better imaging technology and performance have contributed to considerable improvements in breast cancer screening and diagnostic imaging.
The evaluation of service screening
In the post RCT era, numerous evaluations of service screening have taken place, applying case-control and cohort study designs. Case-control studies tend to measure the actual effect of exposure to screening, whereas cohort studies may measure the effect of attending screening, an invitation to screening, or both. In the majority of studies in which mammography has been offered to the public for a significant duration of time, results show that mortality reductions associated with an invitation to screening usually are equal to or better than those observed in the RCTs. In 2005, Gabe and Duffy summarized both the methodological challenges of evaluating screening in a nonexperimental setting, and results from 38 nonrandomized studies of breast cancer screening. The results indicated that breast cancer mortality reductions on the order of 30% to 40% were associated with screening. Since then, further observational studies have been published, the majority of which indicate a substantial and significant reduction in breast cancer mortality with screening.[18-25] In addition, in the evaluation of service screening, the association between the reduction in the risk of being diagnosed with an advanced breast cancer and the subsequent mortality reduction that was observed in the RCTs also are evident in the studies that have examined tumor characteristics in exposed and unexposed women. For example, in a large Swedish study (23,092 cancers and 10,177,113 person-years of observation) the rates of lymph node–positive cancers, of tumors with pathological size > 2 cm, and tumors of TNM stage II or worse were compared before and after the introduction of screening. Rates were adjusted for changes in overall incidence during the period of study and stratified by age (40 to 49 and 50 to 69 years). In the period after screening was introduced, among women aged 40 to 49, there was a significant 45% reduction in tumors greater than 2 cm among women exposed to screening compared with the prescreening period, and a 33% reduction in the 50- to 69-year age group. Somewhat smaller but statistically significant reductions in lymph node–positive tumors and stage II tumors also were observed for all age groups in women exposed to screening, compared with the prescreening period.
An increasingly common expression of screening costs is the NNS or the number to needed to invite (NNI) to prevent one breast cancer death. Results from major organized service screening programs and long-term follow-up of the randomized trials indicate that screening 300 to 400 women for up to 10 years will result in prevention of one breast cancer death.[9,23,27] In contrast, the Nordic Cochrane review finds a very small benefit, of the order of 2000 women needed to invite to screening throughout a 10-year period to prevent one breast cancer death. How do we reconcile these different estimates? One limitation of the NNI is that it is a poor proxy of exposure to screening, since it is inflated by the rate of nonparticipation in screening. This is especially the case, in this instance, when the NNI is derived from a meta-analysis, in which variable rates of nonattendance, number of screening rounds, and duration of follow-up in multiple RCTs further distort the estimate. Perhaps the greatest problem with the estimate from the Nordic Cochrane review is that it is not based on an observed outcome but rather on the authors’ estimate of the outcome based on their subjective judgment of the quality of the screening trials. In contrast, estimates that cluster between 300 and 400 women needed to screen to save one life are derived directly from empirically observed data in randomized trials and service screening programs.[9,23,27]
Potential harms associated with screening
Harms associated with screening derive mostly from adverse effects associated with further examination of the mammographic findings. These include added patient financial cost and inconvenience, anxiety associated with positive test results, biopsy for benign lesions, and overdiagnosis. One has to keep in mind that there are numerous hyperplastic breast changes that mimic the mammographic appearance of breast cancer. Differentiating them from true malignancies necessitates the use of additional diagnostic tools, including needle biopsies, to arrive at a reliable microscopic diagnosis. While these procedures would constitute “harm” when findings are benign, they decrease the occurrence of unnecessary surgical biopsies, which constitute a greater harm. In contrast to the considerable benefits of screening, including a significant decrease in advanced cancers and decreased mortality, “harms” generally include experiences that range from hardly injurious at all to those that are somewhat injurious, such as invasive procedures to rule out the presence of a malignancy.
The potential harm that has received the greatest attention, and which receives the greatest emphasis from critics of screening, is overdiagnosis. Overdiagnosis usually is defined as the diagnosis of a breast cancer by screening that would never have been diagnosed in the patient’s lifetime if screening had not taken place. An “overdiagnosed” breast cancer is pathologically indistinguishable from a breast cancer that is progressive and potentially life-threatening, and thus, when treated, represents overtreatment. If molecular or histologic features could distinguish an indolent cancer from one that is progressive, overtreatment could be avoided. Since this is not the case, however, overdiagnosis must be understood as a statistical concept, measured by examining observed vs expected incidence rates associated with a screening program.
Estimates of overdiagnosis vary widely, reflecting the different methodological approaches to its measurement.[23,29-36] There are two major challenges in estimating overdiagnosis of breast cancer due to screening: 1) taking account of pre-existing trends in breast cancer incidence when estimating the expected incidence in the absence of screening; and 2) distinguishing the excess incidence related to lead time from that resulting from overdiagnosis. Studies that fully take account of pre-existing trends and lead time tend to estimate overdiagnosis rates of 1% to 11%,[23,29,30,34-36] whereas those that do not take these factors into account estimate overdiagnosis rates of 30% or more.[31,32] Therefore, while it is likely that some overdiagnosis results from breast cancer screening, it appears to be a relatively minor phenomenon; moreover, the risk of overdiagnosis in a screened woman is less than the probability of a breast cancer death prevented.
A more commonly encountered side effect of screening is the recall for further assessment of suspicious mammographic findings in women who do not have breast cancer, so-called “false positives.” In Europe, the cumulative risk of such a false-positive exam over 10 screening rounds is approximately 20%, with corresponding cumulative risks of invasive procedures ranging from 1% to 4%. In the US, estimated cumulative risk of at least one false-positive examination over a 10-year period for women undergoing biennial screening was 41.6%, with a corresponding 4.8% cumulative risk of a recommendation for biopsy. The cumulative risk of at least one false-positive is higher if a woman undergoes annual screening.