Measurement of Utilities and Quality-Adjusted Survival
Measurement of Utilities and Quality-Adjusted Survival
Considerations of quality of life are integral to evaluating cancer treatment. Curative cancer therapy often involves substantial toxicity and impairment of health-related quality of life (HRQOL). Palliative cancer therapy may bring marked relief of symptoms even if it does not lengthen life dramatically. Despite widespread recognition of the importance of considering HRQOL in assessing the effectiveness and cost effectiveness of therapies to prevent and treat cancer, examples of cancer clinical trials or economic analyses that take HRQOL into account in medical or economic decision making are remarkably rare. One of the major impediments to the inclusion of quality of life considerations in choosing among different strategies is the difficulty of evaluating tradeoffs between length and quality of life. As a result, HRQOL endpoints are generally not considered in identifying the choice of the superior therapy in randomized trials, for example, unless no measurable differences in any biologic endpoints are found.
The field of decision science offers a useful metric for concurrent consideration of quality and length of life outcomes, namely, quality-adjusted survival. The general notion is that if length of life is plotted against quality of life, then the area under the curve represents quality-adjusted survival, usually measured in quality-adjusted life years (QALYs)  (see Figure).
This is an intuitively appealing measure. It makes explicit and quantitative a way of thinking about outcomes that patients and physicians already use in a more informal fashion in making medical decisions involving tradeoffs between length and quality of life. It is also extremely valuable for cost-effectiveness analysis, since it allows the benefits of alternative uses of health care dollars to be compared in like units that do justice to the palliative as well as the life-prolonging effects of treatment. Consequently, QALYs are the standard measure of effectiveness in economic analyses that take both length and quality of life into account.
But not all measures of quality of life can be used in this fashion. Calculation of quality-adjusted survival requires that quality of life be measured in such a way that the product of length and quality of life is meaningful, eg, that 1 year of life at a quality of x is exactly as desirable as 6 months of life at a quality of 2x. Measures of quality of life that have this property are called utilities .
Utilities are defined as the quantitative measure of the strength of a person's preference for an outcome . By convention, utilities are measured on a scale of 0 to 1, in which 0 represents death and 1 represents excellent health. They differ from more familiar measures of quality of life in that they reflect how a patient values a state of health, not just the characteristics of the health state. The terms "utilities," "values," and "preferences" are sometimes used interchangeably.
Approaches to utility measurement have grown out of economic and decision analytic theory rather than empiric research. In general, utilities are global HRQOL measures. A respondent's utility for a given health state may be elicited in several different ways. The simplest measure is a rating scale. A variety of formats may be employed to elicit a rating scale assessment of global HRQOL, including a visual analog scale, "feeling thermometer," or a verbal numeric scale. Such an item might read:
"If 0 is death and 100 is perfect health, what number would you say best describes your current state of health?"
Rating scales are appealing and practical measures of global quality of life. They can be easily self-administered and have been shown to be reliable and valid. Unfortunately, they are not true utility measures. There is no reason to believe that a respondent who assigns a state of health a score of "50" on a 100 point rating scale means that he or she would be willing to give half of his or her life expectancy in exchange for a return to perfect HRQOL.
In contrast, true utility measures can be interpreted in this fashion because they ask about quality of life in exactly these terms. The classic utility measure, the standard (or reference) gamble, assesses the respondent's utility for his or her own quality of life (or that of a hypothetical health state) by asking how much he or she would risk to improve it. In a standard gamble, the respondent is asked to choose between life in a given clinical state with less than perfect quality of life and a gamble between death and perfect health. The probability of death in the gamble is systematically varied. The respondent's utility for that health state is the probability of perfect health in the gamble at which the respondent is indifferent between the gamble and the certain intermediate outcome. For example, to elicit a patient's utility for his current state of health using a standard gamble, one might ask:
"Would you agree to play a game of Russian roulette under the following conditions? The gun has 10 chambers and one bullet. If you get the bullet, you die immediately; if you get an empty chamber, you experience perfect health for the 20 years of life remaining to you. If you refuse to play, you experience your current state of health for the 20 years of life remaining to you."
If the subject refuses to play, he is offered another game in which the gun has, for example, 1,000 chambers and one bullet. If he agrees to this hypothetical game of Russian roulette, the conditions are changed again so that the gun now has 500 chambers and one bullet. This process is repeated iteratively until the number of chambers in the gun is such that the subject cannot decide whether it is more appealing to play or refuse to play. At that "point of indifference" the ratio of empty chambers to total chambers gives his utility for his current state of health.
Because of the conceptual complexity of standard gambles, they are usually administered by an interviewer with the help of visual aids or by computer using programs designed specifically for this purpose. The visual aids and computer programs provide graphic illustrations of the probabilities involved in the question and have been shown to enhance respondent comprehension. It is not usually feasible to perform in-person interviews with or without the aid of a computer in the typical clinical trial; consequently, there are major practical impediments to collecting standard gamble utilities in this setting. Another characteristic of the standard gamble is that the utility elicited reflects not only how the respondent feels about the quality of life in the health state but also whether he or she is a risk taker or gambler.
An alternative utility measure that is somewhat less difficult to administer and is not influenced by the respondent's attitude toward risk is the time tradeoff. This question assesses the utility of a health state by asking how much time one would give up to improve it. The respondent is offered a choice between a set length of life in a given compromised health state and a shorter length of life in perfect health. The respondent's utility or strength of his or her preference for the compromised health state is given by the ratio of the shorter to the longer life expectancy at which the respondent finds the two choices equally desirable.
Although somewhat more straightforward than the standard gamble, the time tradeoff is still conceptually challenging and requires the respondent to evaluate hypothetical situations. A number of investigators have attempted to develop versions of the time tradeoff that can be self-administered in a clinical trial setting, but the reliability and validity of these techniques have not been rigorously evaluated, and anecdotal evidence suggests that respondent confusion is common.
Other less common techniques of utility assessment, including willingness to pay (used in cost-benefit analysis), equivalence, and magnitude estimation, are characterized by the same level of complexity as standard gambles and time tradeoffs.
Although all these techniques measure quality of life and preferences/values, they differ in their mathematical properties, assumptions, and ease of administration. Surprisingly, the values of utilities obtained using the standard gamble and time tradeoff techniques have been shown in several studies to be quite similar; consequently, they are often used interchangeably.
Although a rating scale does not produce a utility, several studies have shown that the mean utility for a population may be predicted from the mean rating scale value by use of a "transformation" that adjusts the score upward as in the example below :
utility = 1.18 × (rating scale value), for rating scale value less than 0.85
utility = 1, for rating scale value  0.85
Because of the difficulties inherent in direct utility assessment, there is growing interest in "hybrid" approaches that maintain the ease of administration of a traditional quality of life questionnaire while also producing utility estimates appropriate for use in clinical and economic decision making. These health state classification indices consist of two components: a simple health-related quality of life questionnaire that is completed by patients to generate descriptive data, and a formula that assigns a utility to each patient's set of responses to that questionnaire. The formula reflects the relative importance assigned to different domains of HRQOL by respondents in a reference population.
Examples of such systems include the Quality of Well-Being Index
 and the Health Utility Index . Approaches currently undergoing
validation include EuroQol , a measure specifically designed
for international use, and the
Q-tility Index , a cancer-specific tool.
Despite the array of approaches available for utility assessment, a survey of the literature reveals that the most common source of utility estimates in calculations of quality-adjusted survival is "expert opinion." Typically, health care providers are asked to provide an estimate (on a scale of 0 to 1) of what they believe to be the quality of life of patients in a given health state, for example, metastatic breast cancer being treated with hormonal therapy. Although there are ample data to demonstrate that providers' assessments of quality of life are not particularly accurate surrogates for patient self-assessment, expert estimates may be adequate in some cases. For example, when comparing the difference in quality-adjusted survival between alternative treatments or assessing the cost effectiveness of one treatment versus another, expert estimates of the relevant utilities may suffice if it can be shown that the results of the analysis are not particularly sensitive to the value of the utility estimates.
A sounder source of utility data for such a "back of the envelope" calculation of quality-adjusted survival is provided by the published results of surveys in which patients and the general population were asked to rate specific health states. The quality of the methods used to collect these utilities is highly variable, however, and the number of conditions for which such data are available is relatively limited.
The best source of utility estimates is a survey performed specifically for a given analysis. There is considerable debate about who the subjects should be in such a survey: patients experiencing the health state, patients asked to evaluate hypothetical health states they have not yet experienced, or members of the general population.
To some extent, the choice of the optimal study population depends on the question being addressed. It is often argued that if the results are to be used in resource allocation decisions, the relevant utilities are those of the general population. Society's values should determine how society's resources are spent. The problem with this argument is that healthy members of the population may not be sufficiently knowledgeable about the health states they are being asked to rate to be good sources of judgments about the value of life in those health states.
Patients who have experienced or are experiencing the health state are the real experts, of course. While surveys of patients are often the ideal, there are practical impediments to performing direct utility assessment in clinical trials and clinical practice. More important, an argument can be made on theoretical grounds that the relevant preferences for decision making are not those of patients who are experiencing a particular health outcome. These utilities may be influenced by adaptation to illness or cognitive dissonance. In contrast, the utilities of a respondent evaluating an array of potential outcomes may be more appropriate for use in decision making. Health state classification indices therefore rely on patients to provide information on the nature of the impact of a given health state on quality of life, but use proxy decision makers as the source of the weights for generating a utility score for that state.
Until quite recently, the only procedure used to estimate quality-adjusted
survival was decision analytic modeling. In such models, a decision
tree or Markov model is constructed, and each health state or
outcome is assigned a utility in order to generate estimates of
quality-adjusted life expectancy (QALE), usually measured in QALYs.
In the past, the utility estimates used in these models were nearly
always based on "expert opinion" (eg, guesses). More
recently, polls of health professionals or focus groups made up
of patients have become more
The development of the Q-TWiST (quality-adjusted time without symptoms of disease or toxicity of treatment) methodology has facilitated the use of observed survival data from clinical trials to estimate quality-adjusted survival. In this elegant method developed by Gelber et al , the following steps are performed:
1. Define health states likely to occur in patients in a clinical trial.
2. Partition survival into time spent in those states in each treatment group.
3. Assign a utility weight to each health state.
4. Multiply time spent in each health state by that utility weight.
5. Sum these values to estimate quality-adjusted survival in each treatment group.
6. Perform sensitivity analyses to identify the threshold utilities values at which one treatment strategy would be preferred over another.
In Q-TWiST applications to date, utilities have not been measured; rather, the preferred treatment for any possible combination of utilities has been identified. Measured utilities could be used in this method, however, and, in fact, several cancer clinical trials are now collecting utilities to be used in Q-TWiST analyses . In addition to shedding light on the nature of patient values about tradeoffs between length and quality of life, these trials will make important methodologic contributions to the field. They will generate much needed empiric data on the relationship between serial utility measures and both traditional health status measures and biologic outcomes, will provide evidence on the impact of quality adjustment on treatment comparisons, and will demonstrate the feasibility of administering a variety of utility assessment measures in the clinical trial setting. The best way to accumulate the data needed to advance the methodology for measuring utilities and quality-adjusted survival is to begin to do it.
1. Weinstein MC, Stason WB: Foundations of cost-effectiveness for health and medical practices. N Engl J Med 296:716-721, 1977.
2. Tsevat J, Weeks JC, Guadagnoli E, et al: Using health-related quality of life information: Clinical encounters, clinical trials, and health policy. J Gen Intern Med 9:576-582, 1994.
3. Torrance GW: Measurement of health state utilities for economic appraisal: A review. J Health Econ 5:1-30, 1985.
4. O'Leary JF, Fairclough DL, Jankowski MK, Weeks JC: Comparison of time-tradeoff utilities and rating scale values in cancer patients and their relatives: Evidence for a possible plateau relationship. J Med Dec Making 15:132-137, 1995.
5. Kaplan R, Anderson JP: A general health policy model: Update and application. Health Services Research 23:203-235, 1988.
6. Torrance GW, Zhang Y, Feeny D, et al: Multi-attribute preference functions for a comprehensive health status classification system. Paper 92-18. Centre for Health Economics and Policy Analysis, McMaster University, 1992.
7. EuroQol Group: EuroQol: A new facility for the measurement of health-related quality of life. Health Policy 16:199-208, 1990.
8. Weeks J, O'Leary J, Fairclough D, et al: The "Q-tility Index": A new tool for assessing health-related quality of life and utilities in clinical trials and clinical practice. Proc ASCO 13:436, 1994.
9. Gelber RD, Goldhirsch A, Cavelli F: Quality-of-life-adjusted evaluation of adjuvant therapies for operable breast cancer. Ann Intern Med 114:621-628, 1991.
10. Nelson H, Weeks JC, Weiand HS: A phase III prospective randomized trial comparing laparoscopic-assisted colectomy versus open colectomy for colon cancer. Monogr J Natl Cancer Inst 19:51-56, 1995.