Mathematical Modeling for Breast Cancer Risk Assessment

Mathematical Modeling for Breast Cancer Risk Assessment

ABSTRACT: Women at increased risk of breast cancer have important opportunities for early detection and prevention. There are, however, serious drawbacks to the available interventions. The magnitude of breast cancer risk is a crucial factor in the optimization of medical benefit when considering the efficacy of risk-reduction methods, the adverse effects of intervention, and economic and quality-of-life outcomes. Breast cancer risk assessment has become increasingly quantitative and is amenable to computerization. The assembly of risk factor information into practical, quantitative models for clinical and scientific use is relatively advanced for breast cancer, and represents a paradigm for broader risk management in medicine. Using a case-based approach, we will summarize the major breast cancer risk assessment models, compare and contrast their utility, and illustrate the role of genetic testing in risk management. Important considerations relevant to clinical oncology practice include the role of risk assessment in cancer prevention, the logistics of implementing risk assessment, the ramifications of conveying risk information with limited genetic counseling, and the mechanisms for genetics referral. Medical professionals can embrace new preventive medicine techniques more effectively by utilizing quantitative methods to assess their patients’ risks. [ONCOLOGY 16:1082-1099, 2002]

For women at increased risk
of breast cancer, important opportunities exist for primary and secondary
prevention. Effective medical triage requires that risk be recognized and
quantified. An extensive body of literature describes the hormonal/reproductive,
family history, histologic, and demographic factors that contribute to breast
cancer risk. The concept that clinicians should identify women at high risk for
breast cancer has come of age. The justification for practicing breast cancer
risk assessment encompasses the following reasons:

  • The importance of maintaining a high level of suspicion for clinical
    diagnosis, despite the young age of a patient[1]
  • The need to begin surveillance earlier than recommended by standard
  • Better information about the effectiveness of prophylactic mastectomy,
    the ideal surgical approach, and the optimal age at surgery[4-7]
  • The opportunity for breast cancer chemoprevention[8]
  • Recognition of the risks of additional preventable cancers, such as
    ovarian cancer in BRCA1 and BRCA2 carriers
  • The chance to treat not only high-risk patients, but also the high-risk

Genetic counseling for inherited cancer syndromes has grown tremendously over
the past several years, due in large part to the discovery of two genes, BRCA1
and BRCA2, mutations of which account for the majority of hereditary
breast/ovarian cancer families.[9,10] Mutations in several other genes also
confer susceptibility to breast cancer—namely, TP53 (aka p53) associated with
Li-Fraumeni syndrome and PTEN associated with Cowden disease. These conditions
account for less than 1% of hereditary breast cancer, and no available
mathematical modeling incorporates them. Therefore, they will not be discussed
further in this article.

Genetic testing for mutations in BRCA1 and BRCA2 can be thought of as a
highly sophisticated method of risk assessment. However, for the majority of
women, genetic testing is not useful in clarifying risk. Mathematical models can
be used to identify families for whom testing may be beneficial and to estimate
risk in the absence of genetic testing.

For most women at moderate risk (loosely defined as a non-Jewish family with
one or two relatives with breast cancer and no ovarian cancer or male breast
cancer), quantitative risk assessment alone may be sufficient for guiding
medical decision-making about chemoprevention, surgical prevention, and
assessment of the risk/benefit ratio for hormone replacement therapy. Using a
case-based approach, we will summarize the major breast cancer risk assessment
models, compare and contrast their utility, and illustrate the role of genetic
testing in risk management.

Models for Breast Cancer Risk Assessment

Breast cancer is a common disease—the most common cancer found among women
and the second major cause of cancer death. Preliminary searches for the causes
or risk factors for breast cancer have been population-based. After female
gender, the most important risk factor is increasing age. Composite incidence
projections derived from the Surveillance, Epidemiology, and End Results (SEER)
registry of the National Cancer Institute (NCI) have enabled the determination
of general age-related population risks for breast cancer.[11] The next largest
risk factor is family history. Early quantification of this influence consisted
of empiric prevalence tables based on various configurations of affected

Relative risks and odds ratios for various characteristics have been derived
from several studies; however, an individual woman’s risk is based on a
combination of these factors. Therefore, statistical modeling that incorporates
the relative weight of separate risk factors is necessary to approximate an
individual’s unique risk. Ideally, the model is then validated in population
studies. Of the models discussed here, only the Gail model[15] has been

Epidemiologic Models

The quantitative models currently used in breast cancer risk assessment can
be loosely divided into two categories: epidemiologic and genetic. The Gail[15]
and Claus[19] models are epidemiologic tools used to predict absolute breast
cancer risk over specified intervals of time for women who have never had breast
cancer. They are derived from large population-based datasets and, thus, apply
to a broad range of women, particularly those without a strong family history of
breast cancer (Table 1).

Genetic Models

The newest category of models estimates BRCA1 or BRCA2 mutation carrier
status (and, indirectly, breast cancer risk), based entirely on family history
of breast and ovarian cancer. These models were derived from small populations
with a strong family history of these diseases. Specifically, the Couch
(University of Pennsylvania),[20] Shattuck-Eidens,[21] and Myriad (Frank)
models[22] were derived from logistic regression of risk factors predicting a
positive mutation test outcome. The Berry-Parmigiani-Aguilar model (BRCAPRO)[23,24]
is based on Bayesian calculations of the probability of carrying a BRCA1 or
BRCA2 mutation, given the individual family pattern of affected and unaffected

The genetic models calculate mutation probabilities based on affected
individuals. Risk can be adjusted by Mendelian extrapolation for unaffected
relatives. Brief descriptions of each model are presented below and in
Table 2
; a detailed discussion of their derivations can be found

Two other quantitative models of mutation carrier risk not detailed in this
paper are worth noting. First, Ford et al provide tables predicting the
probability of linkage to BRCA1 and BRCA2 for high-risk families with a minimum
of four cases of breast cancer diagnosed prior to age 60 and various
combinations of ovarian cancer and male breast cancer.[26] The probability of
linkage (an indirect measure of whether the gene in question is involved) does
not equate with the probability of finding a mutation, because a variety of
mutation types are not identified even by complete DNA sequencing of the coding
region and intron/exon boundaries. Genetic testing detected BRCA1 or BRCA2
mutations in only 63% of families with linkage scores suggesting involvement of
these genes.

Second, Myriad Genetic Laboratories, Inc, provides and updates a set of
penetrance tables on their website (, reporting the frequency of
BRCA1 and BRCA2 mutations for various constellations of family history,
including Jewish and non-Jewish ancestry. The data in these tables were not
obtained in a controlled research study and have not been statistically modeled.
Moreover, family history was not collected in a systematic, verifiable fashion.
Nevertheless, the dataset includes several thousand individuals who have
undergone genetic testing and is quite impressive.

Gail Model

Using multivariate logistic regression, the following risk factors for
developing breast cancer were identified in the Breast Cancer Detection
Demonstration Project (BCDDP) population: age at menarche, age at first live
birth, number of previous breast biopsies, number of first-degree relatives with
breast cancer, and current age of the individual.[27] In addition to these
characteristics, the demonstration of atypical hyperplasia on biopsy is
incorporated into the original Gail model as another multiplication factor.
Relative risk estimates were calculated for each of these parameters, and a
woman’s composite relative risk is obtained by multiplying the numbers
associated with each relative risk factor. Absolute risk—defined as the
probability of developing breast cancer over a specified time—is computed by
multiplying the composite relative risk by the baseline proportional hazards
estimation derived from the BCDDP population.

The NCI website contains a breast cancer risk assessment tool in Windows
format ( based on a revised version of the Gail
model[28] that was used to determine eligibility for the Breast Cancer
Prevention Trial.[8] It provides 5-year and lifetime risks for developing breast
cancer and differs from the original model in that (1) it predicts invasive
cancer only (the original predicted both invasive and in situ cancers), (2) the
baseline incidence is derived from SEER data (the original Gail model used
baseline data from the BCDDP population), and (3) it includes a separate
baseline incidence for black women (the original applied only to white women).

The Gail model is routinely used in cancer risk counseling to derive a
preliminary breast cancer risk estimate for unaffected women. It is not
applicable to women who have already had either in situ or invasive cancers.
Although the model has been formally validated in three studies[16-18] and can
accurately predict the rate of breast cancer development in populations, it
tends to overestimate risk for young women and underestimate risk for older
women. Some of the overprediction in younger women results from the fact that
the model was based on a population of women who were undergoing annual
screening mammography.

From the standpoint of genetic risk assessment, the main limitations of the
Gail model are that it does not incorporate breast cancer history for more than
two first-degree relatives and does not consider age at onset of cancer.
Furthermore, because second-degree relatives are not included, paternal family
history is ignored. It should also be pointed out that although risk models may
be accurate for populations, risk predictions for individuals may be of limited

Claus Model

A second epidemiologic model used to estimate a woman’s risk of developing
breast cancer over time is the Claus model.[19] Using segregation analysis on
data obtained from the Cancer and Steroid Hormone Study (CASH), tables were
constructed that predict cumulative probabilities for the occurrence of breast
cancer at different ages, depending on both the presence of breast cancer in
various combinations of first- and second-degree relatives and age at onset of
cancer. Although the Claus model is only useful for the subset of women with one
or two relatives with breast cancer, it may be more accurate than the Gail model
for this cohort, particularly in the setting of premenopausal breast cancer and
minor nonfamilial risk factors, and especially when there is a paternal family
history of breast cancer.

In general, the Gail and Claus models should be avoided in individuals with a
strong family history of cancer and used only with caution when genetic testing
has produced negative results.

Couch Model

The Couch model[20] is based on data from 169 women who were assessed at a
high-risk clinic and tested for mutations in the BRCA1 gene. Risk is based on
the average age at diagnosis of breast cancer in a woman’s family, ethnicity
(Ashkenazi Jewish descent or not), the presence of familial breast cancer only
or familial breast and ovarian cancer, and whether any individual has had both
breast and ovarian cancer. Risks are provided in tables.

Shattuck-Eidens Model

The Shattuck-Eidens model[21] is based on a subset of 593 women with either
breast or ovarian cancer who were evaluated in 20 familial risk clinics and
underwent full-sequence mutation analysis for BRCA1. Risk factors included in
the final model are based on the characteristics of both the proband and her
family. For the proband, the risk factors are breast or ovarian cancer status
including age at onset and Ashkenazi Jewish ancestry. For the family, risk
factors include breast or ovarian cancer status, but not age at onset or degree
of relatedness.

Cancer status for both the proband and family members are categorized
according to the presence of breast cancer alone, ovarian cancer alone, or both
cancers in the same individual. Bilaterality is also considered for the proband,
who must be affected for the model to be applicable. Limited risk values are
provided in graphs, but it is necessary to calculate the regression equation for
many families.


Loading comments...
Please Wait 20 seconds or click here to close