Measuring Quality of Life: 1995 Update

November 1, 1995

Often, new treatments for cancer are evaluated solely on the basis of increased survival, depriving us of valuable information about other benefits and drawbacks of these treatments. It is important to raise the question of the

Often, new treatments for cancer are evaluated solely on the basis of increased survival, depriving us of valuable information about other benefits and drawbacks of these treatments. It is important to raise the question of the quality of life as a companion to the question of quantity of life. The trade-off is not always between toxicity vs survival time; sometimes a treatment, however toxic, affords benefit not by virtue of increasing survival, but by palliation of tumor-induced pain or obstruction. Included in this paper is a table that reviews many available quality of life measures that have been designed for, or frequently used with, people with cancer. Proper selection of measures and supplementary questions is an important first step toward a successful evaluation of quality of life. Samples of many of these scales are included in the appendix.

Introduction

The term quality of life (QOL), or health-related quality of life, has emerged to organize and galva nize a collection of outcome evaluation activities over the past two decades in cancer treatment research. Prior to this, length of survival was considered to be the only primary outcome in oncology treatment research. Recently, however, progress in increasing survival has been slow, and at times has exacted considerable cost [1].

It is now widely accepted that in most circumstances quality of survival is as important as quantity of survival. This implies that a severely toxic treatment must be evaluated for its detrimental impact as well as its survival benefit. It also raises a less obvious point: that treatments can be considered efficacious if they improve the quality of life even in the absence of survival benefit. Thus, investigating the impact of cancer treatments on QOL is a two-tailed enterprise where treatment toxicity is traded not only with survival time but also with post-treatment function and well-being.

QOL evaluation entails a multidimensional quantification of patient functional status, usually as perceived by the patient. In the decades to come, treatment intensification strategies which increase toxicity are likely to continue, given the advent of hematopoietic growth factors and improved antiemetic regimens. This further increases the importance of evaluating toxicity, patient function, and patient preferences for treatment. QOL evaluation differs from classical toxicity ratings in two important ways:

1. It incorporates more aspects of function (eg, mood; affect; social well-being) than those which have typically been attributed to treatment.

2. It focuses on the patient's perspective.

The United States Food and Drug Administration has stated that benefit to quality of life (QOL) is one of two requirements for approval of new anticancer drugs [2]. The other, of course, is improved survival. Given the incurability and increasing chronicity and prevalence of many forms of advanced cancer, the QOL endpoint has become very important. Industry has thus joined hands with the caring clinician in an unusual marriage, promoting supportive care and symptom relief in the name of quality of life.

Despite general acceptance of the value of assessing quality of life during cancer treatment, relatively few clinical trials actually include a QOL component. For example, fewer than 5% of clinical trials reviewed as of 1982 by the Department of Health and Human Services studied QOL [3]. A 1986 survey of surgical trials revealed that only 3% had systematically evaluated QOL [4]. In 1995, 15% of the currently active Eastern Cooperative Oncology Group (ECOG) trials include a QOL component. Although there are some prevailing attitudes which devalue the role of quality of life investigation in clinical trials, a larger obstacle to successful QOL research has to do with difficulty coordinating the social and medical sciences in a clinical setting.

Recently, however, developments in health-specific quality of life methodology have made accurate QOL evaluation a possibility. Dozens of measures, many of which are both practical and valid, have emerged over the past decade and are available for use. This paper discusses issues in the selection of patients and measures when studying quality of life during cancer treatment.

Is There a Gold Standard?

One of the purposes of this publication is to clarify the extent to which we can agree on a definition of quality of life as it applies to people with cancer. The closer we can come to agreement, the more likely we will be to prevent the use of inappropriate measures leading to inaccurate and confusing conclusions. Coming to agreement about a definition does not mean selecting one or a single set of measures; there is no "gold standard," and there cannot possibly be one until the construct as it applies to cancer is clarified. Even then, it would probably be unwise to name a gold standard; that would risk allowing the tail to wag the dog. As soon as a measure is accepted as complete, the investigator surrenders the opportunity to assess components not included in the scale even if they have major implications for QOL. For example, although most QOL scales measure common physical problems such as pain and nausea, most do not measure confusion. Confusion may be the linchpin of quality of life in a patient with a brain metastasis, or whose calcium level cannot be controlled. Should a QOL scale therefore measure confusion?

The same question could be asked of dozens of other clinical problems. If the decision about inclusion of an item into a quality of life scale was based upon the possibility that it could be important for any cancer patient, the gold standard scale would be very long indeed. Instead, the probability of occurrence and the relative importance of the item in the overall scheme must prevail. The modular approach, as described by the European Organization for Research and Treatment of Cancer (EORTC) QOL Working Group [5,6] and by Cella and colleagues [7-9], in which a core of general questions is supplemented with disease- and treatment-specific items, is a method for addressing this dilemma.

Defining Quality of Life

In their classic volume, Campbell and colleagues [10] describe quality of life as: "a vague and ethereal entity, something that many people talk about, but which nobody very clearly knows what to do about." While it may ring true, this description is a nightmare for the test developer. Some have suggested abandoning the term quality of life because it is too general to have meaning. Other less nihilistic observers have pointed out that because the current definition of the term is so vague, it has been exploited as a marketing tool [11]. There is a consensus of opinion, at the very least, that QOL is multidimensional [5,11-14].

The integrity of the term quality of life has been justifiably challenged on the grounds that it cannot be validly measured because it means so many different things to so many different people. With respect to both content and construct validity, this is certainly true. Until one has a clear definition of the concept, including its component parts if applicable, one cannot determine whether a scale is validly measuring that construct. The first step toward successful assessment of QOL in the clinical research setting is to clarify its definition and component dimensions.

We had earlier developed a working definition of quality of life which laid the groundwork for measurement: "Quality of life refers to patients' appraisal of and satisfaction with their current level of functioning as compared to what they perceive to be possible or ideal." [12] This earlier definition was modified to explicitly incorporate the multidimensionality of QOL: "Health-related quality of life (QOL) refers to the extent to which one's usual or expected physical, emotional, and social well-being are affected by a medical condition or its treatment." [15]

As the initial definition implies, it is important to obtain an appraisal of the extent of dysfunction as well as a rating of how this appraisal matches expectations. The appraisal itself is important because it documents the patient's report of actual dysfunction. The expectation rating is important because it provides the patient's opinion as to whether that dysfunction is tolerable. Some patients with minimal actual disability are extremely dissatisfied, while others seem quite able to tolerate severe impairment and may even feel fortunate to be obtaining therapy. Many decisions about treatment are best made with this knowledge.

Patients' perceptions of their illness are extremely variable, and factors other than actual disability enter into that perception. For example, bedridden status may be more upsetting to an adolescent receiving bone marrow transplantation than to an older adult with a history of chronic arthritis. For the adolescent, bedridden status represents a 100% decrease in normal activity level. For the older adult who could never expect to be fully ambulatory because of preexisting arthritis, the bedridden status represents less than a complete loss of possible ability. To assume that the same actual activity level in these two individuals would reflect comparable quality of life would be an obvious error.

A more subtle example is the presence of sexual dysfunction in a couple with an active and unconflicted sexual history, compared to the same dysfunction in a couple with a premorbid history of marital conflict and sexual difficulties. To the former couple, the same level of actual dysfunction would likely be more disruptive because it deviates more dramatically from their history. For the couple with premorbid sexual dysfunction, it is unwise to assume their difficulty can be attributed to cancer treatment.

Patient Selection

Access-Gaining access to patients is a significant issue when assessing QOL. Although there may appear to be adequate numbers of patients or families in a particular setting, many studies languish due to accrual problems. Very often, this is due to insufficient motivation to collaborate or even support the study on the part of the treatment staff. Studies in which all feel a commitment or some degree of participation are far more likely to be successful.

From the patient's perspective, quality of life questions, if not placed in a proper context, can be viewed as gratuitously intrusive. Questions must therefore be carefully and parsimoniously drafted. Refusal rates as high as 53% have been reported due to physician, patient, and family resistance arising from their perception of psychological research as irrelevant if not intrusive [16]. Piloting can determine acceptability to patients and families, and written consent can advise them of the nature of the inquiries.

Sampling-Except in cooperative multicenter trials, random or representative samples are rarely attainable. One must usually identify an available study population, and decide upon inclusion and exclusion criteria. Unless everyone will be seen, some strategy for selecting eligible participants must be executed (eg, random drawing; every fourth patient). Without a sampling strategy, the cohort of participants is reduced to a "sample of convenience," in which people are studied according to the availability of time to the researchers, or participant accessibility. This approach can bias study conclusions, because results could arguably be due to some factor that is correlated with the selection method. For example, a telephone study that relies upon reaching people at home may end up overrepresenting homebound or nonworking patients.

Inclusion/Exclusion Criteria-It is in the interest of study feasibility to set minimal exclusion criteria. However, most quality of life research requires consideration of confounding effects of disease, treatment, and personality variables. The answer to how much or how little control should be exercised over inclusion and exclusion criteria depends upon the questions being asked in the study. Conventional wisdom is to set very strict inclusion/exclusion criteria, because a more homogeneous sample will provide more power to statistical tests of differences between groups. As a result, errors of overexclusion are generally more common than errors of overinclusion. However, it is important to note that power (the likelihood of detecting a true difference) is related to the heterogeneity of subject responses, and not the heterogeneity of entry characteristics. Therefore, in cases where no known or expected relationship exists between an entry characteristic and the responses of interest, the investigator ought not exclude patients on the basis of that entry variable.

An example would be gender differences and the development of nausea. Since no known relationship exists between gender and the development of nausea, there is no need to exclude one gender or otherwise control it as a variable. If a relationship between an entry variable and the outcome measure is expected, the entry variable should be made an independent variable, a stratification variable, or a covariate in the analysis.

Sample Size-Ideal sample size is determined by three factors: alpha level (usually set at P less than .05); power, or the probability that the study will accurately confirm the experimental hypotheses; and effect size, the extent to which the phenomenon under study truly differs among groups [17]. In the clinical trial setting it is often advisable to proceed with alpha = .05 and the known available sample size, and then determine, with the conventional amount of power desired (eg, .80), what effect size will be detected in the study.

Selection of Measures

One of the most difficult tasks in quality of life research is to select measures and develop questions that:

1. Demonstrate a clear and significant contribution to patient care

2. Will not constitute an unacceptable burden to patients or staff

3. Are sufficiently sensitive to meet the needs of the investigator

4. Are perceived as unintrusive to patients and the treating staff.

There are many available quality of life scales, some of which are briefly described in Table 1. In this table, the first column provides the scale name and appropriate reference(s). The second column gives the population for which the scale was developed. The range is from a single tumor type to chronic illness in general. If a scale was not developed specifically for cancer patients, it was not included in the table unless it has been commonly used with cancer patients. Therefore, for example, the commonly used Psychosocial Adjustment to Illness Scale, a measure of adjustment to general illness, was included, whereas the array of quality of life measures in the cardiovascular disease and rheumatology literature were not included because they have not generally been used to measure quality of life in cancer patients.

The third column lists the dimensions that are tapped by each scale. This was the most difficult column to conceive and complete. It was difficult to conceive because every scale varied greatly with respect to the names given to the dimensions measured. It was difficult to complete because there were frequent discrepancies between the stated dimensions and the actual item content. It was necessary to decide whether to accept the authors' stated dimensions as what is truly reflected in the scale, or whether to rely more upon actual examination of item content. When factor analytic results were not reported, item content examination was preferred over the authors' stated dimensions. When factor analytic results were presented (which was rare), the reported factors were matched to one of the selected dimensions, and item review was used to add dimensions where relevant.

After reviewing the more than 30 different names for dimensions listed by various authors, seven were selected as sufficiently distinct to warrant separate listing. They were:

1. Physical concerns (symptoms; pain)

2. Functional ability (activity)

3. Family well-being

4. Emotional well-being

5. Treatment satisfaction (including financial concerns)

6. Sexuality/intimacy (including body image)

7. Social functioning

An additional two summary dimensions were also included:

8. Global evaluation of QOL (ie, whether the instrument has a question rating the patient's global or overall perception of QOL or health status)

9. Total score (ie, whether the dimension scores are summed to provide a total [summary] index of QOL).

Evaluating the Seven Dimensions-Given our current level of understanding about the interrelationships between dimensions of quality of life, it is not possible to evaluate the accuracy of this seven-dimension breakdown. In truth, there could be as few as three distinct dimensions (physical health status; social functioning; mental well-being); or perhaps even more than seven. Factor analytic and aggregate index studies have suggested that the physical domain should be divided into symptom (ie, physical experience) vs function (ie, physical abilities and activities) [13], and perhaps even further, dividing gastrointestinal toxicity into a separate group [14,18]. Emotional (mental) well-being, while related to physical well-being, is clearly distinct as well.

The social aspects of QOL have been notoriously the most difficult to capture with brief measurement approaches. Unfortunately, due to issues of cost and burden, brief measurement approaches appear to be essential in
contemporary health services research. As a result, the social well-being dimension(s) have tended to be understudied and therefore remain the most poorly understood.

Scales That Were Excluded-No scale was included in the table if it measured only one dimension of quality of life. Pain scales, symptom and toxicity ratings, mood scales, scales to measure activities of daily living [19], and even the "classic" performance status scales [20,21] are not included in the table because they do not meet the multidimensional part of the definition. Global (one-item) ratings, like that of Gough and colleagues [22] or that of Bernheim and Buyse [23], of quality of life were also excluded from the table. Although they can arguably provide a summary index that totals, in a synthetic way, input from many dimensions, one-item measures make it impossible to determine anything specific about the nature of a change in the global score. They are functionally dimensionless, and were therefore excluded.

The fourth column of the table provides the number of items and the item response format (eg, Likert; visual analog). It also states whether the scale is a questionnaire (listed as "self-report"), whether it includes an interview, or whether it is rated by an observer (usually someone on the health care team, but sometimes a family member). Generally speaking, the information obtained from an observer, while perhaps having its own validity, does not agree strongly with patient self-ratings, and is therefore less trustworthy as quality of life data.

Pitfalls-Experience has shown that there are many practical pitfalls to relying upon questionnaires of quality of life [24-27]. Missing items, misunderstood instructions, inconsistent responses, and language and reading barriers are just a few of the many risks incurred by an over-dependence upon written self-report. In addition, even after controlling for much of the bias introduced by these potential pitfalls, there is evidence that questionnaires are less sensitive than a probing interview in obtaining accurate QOL data [28]. Therefore, it is prudent whenever possible to supplement any questionnaire with a clarifying interview. At the very least, data received from patient self-report should be checked for completeness before allowing the patient to leave.

Failure to monitor the quality of collected data at the time it is generated is perhaps the most common reason for failure of a quality of life investigation once it has gotten underway. The main resistance to interviewing patients is cost, not attitude, for most would agree that the richness of interview data is unmatched by other methods.

The fifth column in Table 1 provides general comments about the extent of the scales' use, and some evaluative statements about administration, scoring, and psychometric properties. Any scale should be critically evaluated by its prospective user along these lines, in order to determine its applicability in a particular study.

The sixth column indicates the available languages as of the time of this writing. Virtually all of the instruments were developed in the English language. EuroQol and the Rotterdam Symptom Checklist are exceptions. The quality of the translations or their validity in the field was not reviewed, so there are no evaluative comments in this column. There is increasing need for multilingual QOL instruments which are at least acceptable and preferably valid across cultures. Translation of an existing single-language document ideally involves an iterative forward-backward-forward sequencing and review of difficulties on an item-by-item basis. The final translated document should then be pretested for acceptability and content validity, then implemented in multilingual clinical trials, where the derived data can then be tested statistically for cross-cultural equivalence or bias.

Finally, the seventh column summarizes the presence or absence of disease- and treatment-specific subsets of questions. These specific questions may be of benefit when added to a general measure of QOL. Together, they can provide comparability across diseases (types of cancer, in this context), and sensitivity to specific issues or symptoms relevant to a given disease or treatment. For example, lymphedema is important to women who have had breast surgery, and should arguably be included in a breast cancer quality of life instrument. However, the question is irrelevant to other cancers. The availability of disease-specific questions which need not be asked of all patients is therefore an asset because it allows for the ideal combination of questionnaire length and content covered.

Scale Selection Strategy

One generally has two options in selecting QOL measuring instruments. The first is to pick an established instrument that has demonstrated reliability and validity. This approach runs the risk of excluding important questions not included on the selected scale. The second option is to construct an instrument based upon clinical knowledge of the problem. This approach assures the investigator of having asked the important questions, but it then begs the question of reliability and validity of measurement.

Of these two options, the one more frequently chosen is the use of existing questionnaires. Disease or treatment-sensitive questions can then be added into the trial, but with uncertain reliability and validity. The frequent absence of an appropriate, specific set of valid questions places the investigator in the uncomfortable position of worrying that important information may be untapped by an insensitive measure. This fear of missing what is most important can lead to a burdening of the patient with a wide array of questionnaires given as a protection against the investigator's uncertainty. For many reasons (eg, responder burden, statistical validity), investigators are advised to be judicious in their selection of questionnaires. It is also useful to remember that as patient burden rises, accrual rate drops and attrition increases.

These pressures can tempt investigators to select desired items or sets of items from previously validated tests. It is a mistake to assume that this is a valid practice. Assuming one had the scale author's approval, the only sound reason for drawing out parts of tests is because the investigator liked their face validity. This is not to say that the selected items could not be validated on their own after being selected and administered to new samples. That practice has been done within the Cancer and Leukemia Group B (CALGB), shortening the 65-item Profile of Mood States to 11 items [29]. It has also been used to shorten the Medical Outcomes Study questionnaire to 36 items [30], and subsequently to 20 items, and by the EORTC to shorten the Hospital Anxiety and Depression Scale from 14 to eight items [6].

Multimethod Approach-Careful and parsimonious selection of measures should not be confused with the importance of assessing an array of domains. Particularly with treatment studies, multiple measures of change are preferred over single measures, for two reasons. First, since most outcomes being measured are concepts that tend to have separate dimensions and are evolving in definition, it is important to assess the concept from different vantage points as a way of enhancing reliability of measurement. Enhancing reliability makes it possible to improve validity.

For example, if one is trying to examine and seek methods to improve self-care and general well-being in patients who have been discharged home from the hospital, it is useful to obtain information from the patient's perspective, a family member's point of view, and a rating by an objective interviewer (not a core member of the investigative team). Each of these information sources provide different and potentially disparate data, so the omission of one or two of them could reduce the reliability of the data, thereby diminishing the validity of the claim that self-care and well-being were assessed.

The second reason for preference of multiple measures over single measures is breadth of coverage. It may be worthwhile to examine dependent variables that are peripheral to the central medical study, because they may provide useful information when the study itself yields negative results. A good example of this is the assessment of quality of life in clinical trials that compare two or more different treatment regimens. Often, a large-scale clinical trial will contrast treatment regimens that possess different potentials for toxicity and medical risk. The advantage of adding quality of life measurement in such studies, besides the obvious advantages of large homogeneous groups of participants and random assignment to treatment arms, is that differences in quality of life can be used as part of the evaluation of efficacy of treatment when differences in response rate and survival do not emerge as significant. This level of contribution has been demonstrated in the treatment of lung carcinoma, for example, within the cooperative clinical trials setting [31], and in the treatment of breast cancer [32], as well as sarcoma [33].

Companion psychosocial assessment can also add information about treatment effects which can be included in later treatment considerations. For example, neuropsychological studies have shown that cranial irradiation, as central nervous system (CNS) prophylaxis in acute lymphocytic leukemia, is associated with long-term deficits in attention and overall intellectual functioning [34]. This has led to the limiting of prophylactic cranial irradiation to those children considered to be at high risk for CNS recurrence.

Questionnaire Reliability and Validity

Generally speaking, a test cannot be considered valid until it has demonstrated satisfactory reliability. Typically, reliability and validity of measurement are expressed in the form of correlation coefficients:

Reliability-Two synonyms for reliability are repeatability and consistency. Repeatability refers to the extent to which a questionnaire, applied two different times (test-retest), or in two different ways (alternate form and inter-rater), produces the same score. Consistency refers to the homogeneity of the items of a scale. A measure's internal consistency is usually expressed in terms of Cronbach's coefficient alpha, and can be easily done with most social science computer statistical packages.

Test-retest reliability refers to the stability of a measure over time (ie, the correlation between test scores at assessment 1 and those at assessment 2). The time interval between assessments should be relatively brief (eg, 3 to 7 days), because it is hoped that true change in the patient's QOL will not have occurred during the interval. Any true change detected by the questionnaire will be interpreted as error and thereby deflate the correlation coefficient.

A ruler has perfect test-retest reliability because it will always produce the same score when applied to the same object. No QOL measure can ever expect to achieve such perfection, expressed as a correlation coefficient of 1.0, but close approximations (eg, correlations above .70) are important [35].

Alternate form reliability deals with the question of whether two versions of a test measure the same thing (also measured using a coefficient of correlation). Since there are currently no alternate forms of any commonly-used health status measures, this form of reliability is not relevant to the current discussion.

Internal consistency examines whether individual items within one test contribute consistently to the total score obtained. This is calculated as coefficient alpha. If a test is made up of 20 items that purport to measure QOL, then responses to each of the 20 items should correlate with one another and the total score, if the scale is to have high internal consistency. However, since QOL is not a unidimensional construct, a high overall internal consistency coefficient might not be necessary to obtain valid measurement. In light of this, it may be acceptable to use short QOL subscales with internal consistency (alpha) coefficients below the usual conventional value of .70.

Inter-rater reliability refers to the association between ratings of two or more independent judges. Naturally, this is an issue only for observer-rated QOL scales. Pearson correlation coefficients above .70 are generally considered acceptable, although it is preferable to achieve coefficients above .80. When ratings are done on a noncontinuous (categorical) scale, the coefficient should be corrected for the likelihood of chance agreement between judges (Cohen's Kappa).

Reliability is not a fixed property of a measure, but rather of a measure used with certain people under certain conditions. Because of this, reported reliability cannot be assumed to be generalizable and therefore should be reevaluated in later applications. This is not usually possible with test-retest reliability, because of the brief reassessment window. But internal consistency can (and should) easily be checked in any data set.

Internal consistency is dependent upon the number of items. As the number of items goes up, so too does the reliability coefficient. It is this observation that led to the Spearman Brown correction of the split-half reliability technique, which evaluates the internal consistency of a test by splitting it in half and correlating the two halves (Cronbach's alpha is the average of all split halves). A corollary to this basic principle is that subtests will usually have lower reliability than a total score, because they have fewer items. It would be acceptable, for example, to implement a measure which has Cronbach alpha coefficients ranging from, say, .60 to .80, for the subtests, and .85 for the total score. One might be more cautious about interpreting data from the least internally-consistent subtest, especially if it were the only one with significant results.

Reliability is also increased by heterogeneous samples. This occurs because heterogeneous samples produce a greater spread of scores, which will inflate the reliability coefficient. Therefore, a coefficient of .70 on a group of patients with advanced pancreatic cancer may be comparable to a coefficient of .80 obtained with a sample combining a spectrum of patients.

Validity-Validity refers to a scale's ability to measure what it purports to measure. A scale must be reliable in order to be valid, but it needn't be valid in order to be reliable. The ruler described above, for example, is certainly reliable; however it's validity must be clarified. As a measure of length, it can be demonstrated to be valid; in fact it is a standard against which other measures of length could be compared. However, it has limited validity as a measure of weight. Length and weight are two different physical constructs, one of which is perfectly measured by a ruler; the other of which can be estimated by a ruler. If we had no better measure of weight than a ruler, it could arguably be used as a reasonably valid approximation, but certainly not a gold standard. In some sense, this is where we find ourselves today in QOL measurement: Without a gold standard, trying to approximate a concept we agree to be important and measurable. Data collected to substantiate this effort are validity data.

Three Types of Validity-Validity has generally been subdivided into three types: Content, criterion, and construct. Content validity is further divided into face validity (the degree to which the scale superficially appears to measure the construct in question) and true content validity (the degree to which the items accurately represent the range of attributes covered by the concept). Two things are important to understand about content validity.

First is that since content validity does not include statistical evidence to support inferences made from tests (the central feature of validity), some [36] do not consider content evidence to be a true measure of a scale's validity.

Second, with a multidimensional concept such as QOL, content coverage should cut across at least three broad domains (ie, physical, psychological, and social) in order to be considered valid from the perspective of item content. The scale reviewer can evaluate this by examination of the development strategy for the scale as well as the actual content of the items themselves, which may or may not be reflected by subtest scores.

Criterion-related validity is also subdivided into two types, concurrent validity and predictive validity. The distinction between the two is a function of when the criterion data are collected. Criterion data that are collected simultaneously with the scale data provide evidence of concurrent validity. Criterion data that are collected some time after the assessment (eg, survival time; response to treatment; future answers to questionnaire) provide evidence for predictive validity. It is common to see scores on the self-report measure of QOL in question correlated to another "standard" which has been completed at the same time, provided as evidence of concurrent validity. Generally, when the method of completion is the same and the timing is concurrent, one would seek coefficients only slightly below the square root of internal consistency coefficients for the reference and comparison scales. Similarly, test-retest reliability coefficients can be considered as upper bounds of predictive validity.

Construct validity extends criterion-related validity into a broader arena in which the scale in question is tested against a theoretical model, and adjusted according to results that can in turn help refine theory [37]. There are many different approaches to construct validation. One is to examine a matrix of correlations between the scale in question and the following: other measures of the same concept; measures of related concepts, measures of unrelated concepts; and different methods of data collection (eg, self-report vs observer rating). This multitrait-multimethod matrix permits one to test for the presence of hypothesized high correlations (convergent validity) and hypothesized low correlations (discriminant validity).

Other contributions to construct validity can be derived from multidimensional scaling and factor analytic approaches which can confirm the presumed multidimensional nature of QOL. It might seem contradictory, but it may also help to conduct item analyses based upon a unidimensional scaling model for the overall measure as well as the component subtests, given the fact that QOL dimensions are intercorrelated.

The ability of an instrument to differentiate groups of patients expected to differ in QOL is also an important validation of its sensitivity. A "known groups technique" [38] can be employed in which patients with, for example, advanced disease are compared to those with limited disease to determine whether the QOL measure detects the differences known to exist between groups. The same could be done by comparing QOL scores of inpatients to outpatients, patients receiving adjuvant therapy to a clinically comparable group receiving no therapy, homebound patients to ambulatory patients, and so forth. Finally, the demonstration of an instrument's sensitivity or responsiveness to change over time parallel to changes in clinical status is an important example of its validity which can easily be neglected in early psychometric evaluations [39].

Like reliability, validity should not be considered to be an absolutely achieved status of a measure. Validity data are cumulative, requiring ongoing updates and refinements. Related to this, validity is relative in that a given measure might be valid (ie, sensitive) in one setting and not in another. Consider a measure which emphasizes activities of daily living skills and physical sensations. Such a measure may be valid in the context of metastatic breast cancer, but insensitive in early stage disease, where virtually all patients will score at the top of the scale. The potential for sample-dependent ceiling effects such as this (and floor effects in the reverse case), warrant caution when selecting the best instrument for a given population.

Related to validity is the issue of meaningfulness of the data obtained. A comparison of treatment arms might indeed result in differences in QOL, but how much of a difference is clinically meaningful, as opposed to statistically significant? For seven-point Likert scaling of symptoms, Jaeschke et al [40] have suggested a difference of approximately 0.5 units per item as a minimal clinically important difference. For other types of scaling (eg, linear analog), Jacobson and Truax [41] recommend a Reliable Change Index that estimates whether a change measured is real vs a consequence of imprecise measurement.

Utility Approaches

Originally introduced by Bush and colleagues in 1973 [42], this approach adjusts survival time downward to a degree proportional to the amount of disability or toxicity endured. Variations on this statistical theme have been called "quality-adjusted life-years," or QALYs [43], "well-years [44]," and, most recently in the cancer-specific context, "quality-adjusted time without symptoms or toxicity," or Q-TWiST [45]. These approaches are most useful in health policy decision-making, or where the effectiveness of two or more competing treatments or programs must be evaluated for relative efficacy.

The utility approach to health status measurement evolved from a tradition of cost-benefit analysis, into cost-effectiveness approaches and, most recently, cost-utility approaches [46]. The cost-utility approach extends the cost-effectiveness approach conceptually by evaluating the QOL benefit produced by the clinical effects of a treatment, thereby including the (presumed) patient's perspective. To be used this way, QOL must be measured as a utility since, by definition, utilities can be multiplied by time to produce an adjusted time which is less than or equal to actual survival time.

Two general cost-utility methods are the standard gamble approach and the time trade-off approach [47]. In the standard gamble approach, people are asked to choose between their current state of health and a "gamble" in which they have various probabilities for death or perfect health. The time trade-off method involves asking people how much time they would be willing to give up in order to live out their remaining life expectancy in perfect health. All utility approaches share in common the use of a 0 to 1 scale in which 0 = death and 1 = perfect health. In practice, most cost-utility analyses employ expert estimates of utility weights, or in some cases, weights provided by healthy members of the general public. It is often assumed that these weights are reasonable approximations of patient preferences. However, several studies have demonstrated that utilities obtained from patients are generally higher than those provided by physicians, which are, in turn, higher than utilities for the same health states obtained from healthy individuals [48].

There are practical impediments to collection of utilities directly from patients, including the complexity of the concepts involved and the requirement for an interviewer-administered questionnaire (often unfeasible in the cooperative group setting). In addition, utility assessments provide little information on important disease and treatment-specific problems and are probably less sensitive to changes in health status over time than psychometric data [49-50]. Finally, the few studies that have been done involving simultaneous measurement of utilities and health status have found them at best to be moderately correlated, with measures of mood and depression correlating more highly than other measures with utilities [51].

A modified utility approach has been developed to evaluate the effectiveness of adjuvant chemotherapy for early stage breast cancer [52]. This approach, the quality-adjusted time without symptoms and toxicity (Q-TWiST), discounts survival time spent with toxicity or symptoms relative to disease-free survival off therapy. Thresholds for decision-making were determined by modeling actual survival data, and judgments were made by the investigators regarding where patient preferences were likely to fall relative to these threshold values. There is no theoretical reason that actual patient preference data could not be used in the Q-TWiST analyses or other studies of quality-adjusted survival. If the relationship between psychometric data and utilities can be established, it will become possible to collect psychometric data and base utility estimates on the reports of patients rather than the best guesses of others.

Summary and Recommendations

Selecting patients and measures is an important first step in quality of life evaluation in cancer research and treatment. There is currently no "gold standard" or "best" quality of life measure. Quality of life is a subjective and fluid endpoint, so its measurement must include the patient's perspective, and be sensitive to change over time.

It is important to be aware of the strengths and weaknesses of available measures when setting out to study quality of life. While knowledge of existing measures is necessary, it is not sufficient in planning measurement strategies. Other methodological issues, like timing of administration, availability of personnel, responsibilities for quality control, and data management will arise in any study of quality of life. Below is a list of recommendations for selecting the appropriate QOL measure:

1. Avoid using the term quality of life when measuring only one dimension of the concept (eg, pain, distress, vocational functioning, performance status, nausea). Measurement should include at least three of the generally accepted components of QOL.

2. Be careful in defining the study questions. The QOL measure selected should derive from them, rather than vice-versa.

3. Select the measure according to the characteristics of the population to be studied. For example, when measuring quality of life in an elderly or incapacitated population, use a scale that measures activities of daily living. In the adjuvant chemotherapy setting with premenopausal women, the site-specific Breast Cancer Chemotherapy Questionnaire (BCQ) [53] or the linear analog scale of Selby and colleagues [54] would be better (see Table 1A, 1B , 1C and Appendix). The scale should be sensitive enough to detect subtle changes if people will be followed during treatment.

4. Supplement an existing scale, chosen according to the above, with a few relevant and specific items tapping areas not included in the selected scale. This will provide both standardized assessment for comparison across disease sites and treatments, and specific information about problems unique to the individual patient group under study. When analyzing these results, it is best to keep separate the selected scale and the additional items, thereby retaining the psychometric integrity of the standardized test. Do not "pick and choose" items from existing scales without prior discussion with the scale author, and never assume that doing so retains measurement validity.

5. Remember that quality of life is more than absence of dysfunction or distress: it includes a sense of well-being and life satisfaction [10]. If the scale of choice does not specifically address these areas, they should be added.

6. When feasible, combine self-report with observer rating, since they so often do not match. Obtaining both perspectives can be particularly useful in validating new measures, where some correlation would be expected between measures, and yet the uniqueness of the perspectives can be demonstrated. Use an interview format with opportunity for clarification of question meaning whenever feasible. The data obtained with the added interview may be more valid than that obtained by questionnaire alone.

7. Measurement techniques should be as simple as possible. Complicated or lengthy forms often distort information given by people with a range of educational and cultural backgrounds.

8. The time frame of functioning specified in the questionnaire should be short, perhaps 1 to 2 weeks. Asking patients about a longer period of time increases bias due to memory loss, and the tendency that people will respond according to their personality (trait) characteristics, rather than how they are actually doing at the time of questioning. A shorter time frame helps patients avoid confusing specific information with general complaints of dissatisfaction [55].

9. Consider the burden on the patient and the study personnel. For most multicenter research it is advisable to use questionnaires that require less than 15 minutes to complete. Select the number of repeat assessments judiciously. In addition, the measurement plan must be consistent with the treatment procedures within the institution. Availability of trained personnel, space, materials, and time must all be considered prior to initiating any study.

References:

1. Bailar JC, Smith EM: Progress against cancer? N Engl J Med 19:1226-1232, 1986.

2. Johnson JR, Temple R: Food and Drug Administration requirements for approval of new anticancer drugs. Cancer Treat Rep 69:1155-1157, 1985.

3. Department of Health and Human Services, Compilation of experimental cancer therapy protocol summaries, 6th ed. United States Government Publication, 1983.

4. O'Young J, McPeek B: Quality of life variables in surgical trials. J Chronic Dis 40:513-522, 1987.

5. Aaronson NK, Ahmedzai S, Bergman B, et al: The European Organization for the Research and Treatment of Cancer: A quality of life instrument for use in international clinical trials in oncology. J Natl Cancer Inst 85(5):365-376, 1993.

6. Aaronson NK, Bullinger M, Ahmedzai S: A modular approach to quality-of-life assessment in cancer clinical trials. Recent Results in Cancer Research, vol 111, pp 231-249. Berlin, Springer-Verlag, 1988.

7. Cella DF, Tulsky DS, Gray G, et al: The Functional Assessment of cancer Therapy (FACT) Scale: Development and validation of the general version. J Clin Oncol 11(3):570-579, 1993.

8. Cella DF, Bonomi AE, Lloyd S, et al: Reliability and validity of the Functional Assessment of Cancer Therapy-Lung (FACT-L) quality of life instrument. Lung Cancer 12:199-220, 1995.

9. Cella DF, Bonomi AE: The Functional Assessment of Cancer Therapy (FACT) and Functional Assessment of HIV Infection (FAHI) quality of life measurement system, in Spilker B (ed): Quality of Life and Pharmacoeconomics in Clinical Trials. New York, Raven Press, in press.

10. Campbell A, Converse PE, Rodgers WL: The Quality of American Life, p 471. New York, Sage, 1976.

11. Aaronson NK: Quality of life: What is it? How should it be measured? Oncology 2(5):69-74, 1988.

12. Cella DF, Cherin EA: Quality of life during and after cancer treatment. Compr Ther 14(5):69-75, 1988.

13. Stewart AL, Ware JE, Brook RH: Advances in the measurement of functional status: Construction of aggregate indexes. Med Care 19:473-488, 1981.

14. Schipper H, Clinch J, McMurray A, et al: Measuring the quality of life of cancer patients: The Functional Living Index-Cancer: Development and validation. J Clin Oncol 2:472-483, 1984.

15. Cella DF: Measuring quality of life in palliative care (suppl 3). Semin Oncol 22(2):73-81, 1995.

16. McCorkle R, Packard N, Landenburger K: Subject accrual and attrition: Problems and solutions. J Psychosoc Oncol 2(3/4):137-146, 1985.

17. Cohen J: Statistical Power Analysis for the Behavioral Sciences. Orlando, Florida, Academic Press, 1977.

18. deHaes JCJM, Raatgever JW, van der Burg MEL, et al: Evaluation of the quality of life of patients with advanced ovarian cancer treated with combination chemotherapy, in Aaronson NK, Beckmann J (eds): The Quality of Life of Cancer Patients. New York, Raven Press, 1987.

19. Katz ST, Ford AB, Moskowitz RW, et al: Studies of illness in the aged: The index of ADL. JAMA 185:914-919, 1963.

20. Karnofsky DA, Burchenal JH: The clinical evaluation of chemotherapeutic agents in cancer, in McCleod CM (ed): Evaluation of Chemotherapeutic Agents, pp 191-205. New York, Columbia University Press, 1949.

21. Zubrod CG, Schneiderman M, Frei E, et al: Appraisal of methods for the study of chemotherapy of cancer in man: Comparative therapeutic trial of nitrogen mustard and triethylene thiophosphoramide. J Chronic Dis 11:7-33, 1960.

22. Gough IR, Furnival CM, Schilder L, et al: Assessment of the quality of life of patients with advanced cancer. Eur J Cancer Clin Oncol 19:1161-1165, 1983.

23. Bernheim JL, Buyse M: The Anamnestic Comparative Self Assessment for measuring the subjective quality of life of cancer patients. J Psychosoc Oncol 1(4):25-38, 1984.

24. Yates JW, Edwards B: Practical concerns and pitfalls in measurement methodology (suppl 10). Cancer 53:2376-2379, 1984.

25. Ganz PA, Haskell CA, Figlin RA, et al: Estimating the quality of life in a clinical trial of patients with metastatic lung cancer using the Karnofsky Performance Status and the Functional Living Index-Cancer. Cancer 61:849-856, 1988.

26. Schipper H, Levitt M: Measuring quality of life: Risks and benefits. Cancer Treat Rep 69:1115-1123, 1985.

27. van Dam FSAM, Aaronson NK: Practical problems in conducting cancer-related psychosocial research, in Aaronson NK, Beckmann J (eds): The Quality of Life of Cancer Patients. New York, Raven Press, 1987.

28. Anderson JP, Bush JW, Berry CC: Classifying function for health outcome and quality of life evaluation: Self- versus interviewer modes. Med Care 24(5):454-469, 1986.

29. Cella DF, Jacobsen PB, Orav EJ, et al: A brief POMS measure of distress for cancer patients. J Chronic Dis 40(10):939-942, 1987.

30. Stewart AL, Hays RD, Ware JE: The MOS Short-form General Health Survey: Reliability and validity in a patient population. Med Care 26:724-735, 1988.

31. Silberfarb PM, Holland JCB, Anbar D, et al: Psychological response of patients receiving two drug regimens for lung carcinoma. Am J Psychiatry 140:110-111, 1983.

32. Coates A, Gebski V, Bishop JF, et al: Improving the quality of life during chemotherapy for advanced breast cancer. N Engl J Med 317:1490-1495, 1987.

33. Sugarbaker PH, Barofsky I, Rosenberg SA, et al: Quality of life assessment of patients in extremity sarcoma trials. Surgery 91:17-23, 1982.

34. Rowland JH, Glidewell OJ, Sibley RF, et al: Effects of different forms of central nervous system prophylaxis on neuropsychologic function in childhood leukemia. J Clin Oncol 2:1327-1335, 1984.

35. Nunnally JC: Psychometric Theory. New York, McGraw-Hill, 1967.

36. Messick S: The once and future issues of validity: Assessing the meaning and consequences of measurement, in Wainer H, Braun HI (eds): Test Validity. Hillside, NJ, Lawrence Erlbaum Associates, 1988.

37. Campbell DT, Fiske DW: Convergent and discriminant validation by the multitrait-multimethod matrix. Psychol Bull 56:85-105, 1959.

38. Bohrnstedt GW: Measurement, in Rossi PH, Wright JD, Anderson AB (eds): Handbook of Survey Research. New York, Academic Press, 1983.

39. Guyatt G, Walter S, Norman G: Measuring change over time: Assessing the usefulness of evaluative instruments. J Chronic Dis 40:171, 1987.

40. Jaeschke R, Singer J, Guyatt GH: Measurement of health status: Ascertaining the minimal clinically important difference. Control Clin Trials 10:407-415, 1989.

41. Jacobson NS, Truax P: Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. J Consult Clin Psychol 59:12-19, 1991.

42. Bush JW, Chen M, Patrick DL: Cost-effectiveness using a health status index: Analysis of the New York State PKU screening program, in Berg R (ed): Health Status Index, pp 172-208. Chicago, Hospital Research & Educational Trust, 1973.

43. Weinstein MC: Cost-effective priorities for cancer prevention. Science 221(4605):17-23, 1983.

44. Kaplan RM, Bush JW: Health-related quality of life measurement for evaluation research and policy analysis. Health Psychol 1:61-80, 1982.

45. Gelber RD, Goldhirsch A: A new end-point for the assessment of adjuvant therapy in postmenopausal women with operable breast cancer. J Clin Oncol 4:1772-1779, 1986.

46. Drummond MF, Stoddart GL, Torrance GW: Methods for Economic Evaluation of Health Care Programmes. Oxford, Oxford University Press, 1987.

47. Torrance GW: Measurement of health state utilities for economic appraisal: A review article. J Health Econ 5:1-30, 1986.

48. Boyd NF, Sutherland HJ, Heasman KZ, et al: Whose utilites for decision analysis? Med Decis Making 10:58-67, 1990.

49. Tsevat J, Goldman L, Soukup JR, et al: Stability of utilities in survivors of myocardial infarction. Med Decis Making 10:323, 1990.

50. Canadian Erythropoietin Study Group: Association between recombinant human erythropoietin and quality of life and exercise capacity of patients receiving hemodialysis. BMJ 300:573-578, 1990.

51. Tsevat J, Cook EF, Soukop JR, et al: Utilities of the seriously ill (abstract). Clin Res 39:589A, 1991.

52. Gelber RD, Goldhirsch A, Cavalli F: Quality-of-life-adjusted evaluation of adjuvant therapies for operable breast cancer. Ann Intern Med 114:621-628, 1991.

53. Levine MN, Guyatt GH, Gent M, et al: Quality of life in stage II breast cancer: An instrument for clinical trials. J Clin Oncol 6:1798-1810, 1988.

54. Selby PJ, Chapman JAW, Etazadi-Amoli J, et al: The development of a method for assessing the quality of life of cancer patients. Br J Cancer 50:13-22, 1984.

55. Huisman SJ, van Dam FSAM, Aaronson NK, et al: On measuring complaints of cancer patients: Some remarks on the time span of the question, in Aaronson NK, Beckmann JH (eds): The Quality of Life of Cancer Patients. New York, Raven Press, 1987.

56. Fetting J, Fairclough D, Gonin R, et al: Compliance with a quality of life (ql) evaluation in a cooperative group trial. Proc Am Soc Clin Oncol 13:1572, 1994.

57. Fairclough D, Fetting J, Wonson W, et al: Quality of life for breast cancer patients receiving CAF versus a 16-week multi-drug regimen as adjuvant therapy. Am Soc Clin Oncol Proc, May, 1995.

58. Heinrich RL, Schag CC, Ganz PA: Behavioral medicine approach to coping with cancer: A case report. Cancer Nurs 7:243-247, 1984.

59. Schag CC, Heinrich RL, Ganz PA: Cancer Inventory of Problem Situations: An instrument for assessing cancer patients' rehabilitation needs. J Psychosoc Oncol 1:11-24, 1983.

60. Schag CA, Heinrich RL: Developing a comprehensive tool: The CAncer Rehabilitation Evaluation System. Oncology 4:135-138, 1990.

61. Schag CAC, Heinrich RL: CAncer Rehabilitation Evaluation System (CARES). Manual, 1st ed, Los Angelos, CA, Cares Consultants, 1989.

62. Ganz PA, Hirji K, Sim MS, et al: Predicting psychosocial risk in patients with breast cancer. Med Care 31(5):419-431, 1993.

63. Schag CA, Ganz PA, Heinrich RL: CAncer Rehabilitation Evaluation System-Short Form (CARES-SF): A cancer-specific rehabilitation and quality of life instrument. Cancer 68(6):1406-1413, 1991.

64. Schag CAC, Ganz PA, Wing DS, et al: Quality of life in adult survivors of lung, colon, and prostate cancer. Qual Life Res 3:127-141, 1994.

65. The EuroQol Group: EuroQol: A new facility for the measurement of health-related quality of life. Health Policy 16:199-208, 1990.

66. Brooks R, Jendteg S, Lindgren B, et al: EuroQol: Health-related quality of life measurement: Results from the Swedish questionnaire exercise. Health Policy 18:25-36, 1991.

67. Nord E: EuroQol: Health-related quality of life measurement: Valuations of health states by the general public in Norway. Health Policy 18:25-36, 1990.

68. Aaronson NK, Ahmedzai S, Bullinger M, et al: The EORTC Study Core Quality of Life Questionnaire: Interim results of an international field study, in Osoba D (ed): Effect of Cancer on Quality of Life, pp 185-203. Boston, CRC Press, 1991.

69. Osoba D, Zee B, Warr D, et al: Psychometric properties and responsiveness of the EORTC Quality of Life Questionnaire (QLQ-30) in patients with breast, ovarian, and lung cancer. Qual Life Res 3:143-154, 1994.

70. Cella DF, Lee-Riordan D, Silberman M, et al: Quality of life in advanced cancer: Three new disease-specific measures (abstract #1225). Proc Am Soc Clin Oncol 8:315, 1989.

71. Bonomi AE, Cella DF, Bjordal K, et al: Multi-lingual translation of the Functional Assessment of Cancer Therapy quality of life measurement system. Submitted for publication.

72. Clinch J: The Functional Living Index - Cancer: Ten years later, in Spilker B (ed): Quality of Life and Pharmacoeconomics in Clinical Trials. New York, Raven Press, in press.

73. Priestman TJ, Baum M: Evaluation of quality of life in patients receiving treatment for advanced breast cancer. Lancet 24:899-901, 1976.

74. Baum M, Priestman T, West RR, et al: A comparison of subjective responses in a trial comparing endocrine with cytotoxic treatment in advanced carcinoma of the breast (suppl 1). Eur J Cancer 16:223-226, 1980.

75. Coates AS, Dillenbeck FC, McNeil DR, et al: On the receiving end-II: Linear Analog Self-Assessment (LASA) in evaluation aspects of the quality of life of cancer patients receiving therapy. Eur J Clin Oncol 19:1633-1637, 1983.

76. Chambers LW, Macdonald LA, Tugwell P, et al: The McMaster Health Index Questionnaire as a measure of quality of life for patients with rheumatoid disease. J Rheumatol 9:780-784, 1982.

77. Chambers LW: The McMaster Health Index Questionnaire: An update, in Walker SR, Rosser RM (eds): Quality of Life Assessment: Key Issues in the 1990's, pp 131-149. London, Kluwer Academic Publishers, 1993.

78. Ware JE, Sherbourne CD, Davies AR: Developing and testing the MOS 20-item Short-Form Health Survey: A general population application, in Stewart AL, Ware JE (eds): Measuring Functioning and Well-Being: The Medical Outcomes Study Approach. Durham, NC, Duke University Press, 1992.

79. Stewart AL, Hays RD, Ware JE: The MOS Short-Form General Health Survey: Reliability and validity in a patient population. Med Care 30:473, 1992.

80. Ware JE, Sherbourne CD: The MOS 36- Item Short Form Health Survey (SF-36)-I: Conceptual framework and item selection. Med Care 30(6):473-483, 1992.

81. McHorney CA, Ware JE, Raczek AE: The MOS 36-Item Short Form Health Survey (SF-36)-II: Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care 31(3):247-263, 1993.

82. McHorney CA, Ware JE, Lu JFR, et al: The MOS 36-Item Short Form Health Survey (SF-36)-III: Tests of data quality, scaling assumptions and reliability across diverse patient groups. Med Care 32(1):40-66, 1994.

83. Hays RD, Sherbourne CD, Mazel RM: The RAND 36-item Health Survey 1.0. Health Econ 2:217-227, 1993.

84. Hunt S, McKenna SP, McEwen J, et al: The Nottingham Health Profile: Subjective health status and medical consultations. Soc Sci Med 15A:221-229, 1981.

85. Hunt SM, Alonso J, Bucquet D, et al: Cross-cultural adaptation of health measures. Health Policy 19:33-34, 1991.

86. Wiklund I: The Nottingham Health Profile - A measure of health-related quality of life (suppl 1). Scand J Prim Health Care 15-18, 1990.

87. Lansky SB, List MA, Ritter-Sterr C, et al: Performance parameters in head and neck patients (abstract #603). Proc Am Soc Clin Oncol 7:156, 1988.

88. List MA, Ritter-Sterr C, Lansky SB: A performance status scale for head and neck cancer patients. Cancer 66(3):564-569, 1990.

89. Derogatis LR, Lopez M: PAIS & PAIS-R: Administration, Scoring and Procedures Manual. Baltimore, Clinical Psychometric Research, 1983.

90. Ferrans CE, Powers MJ: Quality of life index: Development and psychometric properties. Adv Nurs Sci 8(1):15-24, 1985.

91. Ferrans CE: Development of a quality of life index for patients with cancer. Oncol Nurs Forum 17(3):15-19, 1990.

92. Ferrans CE, Powers MJ: Psychometric assessment of the Quality of Life index. Res Nurs Health 15:29-38, 1992.

93. Spitzer WO, Dobson AJ, Hall J, et al: Measuring the quality of life of cancer patients: A concise QL-index for use by physicians. J Chronic Dis 34:585-597, 1981.

94. Padilla GV, Presant C, Grant MM, et al: Quality of life index for patients with cancer. Res Nurs Health 6:117-126, 1983.

95. Padilla GV: Validity of health-related quality of life subscales. Prog Cardiovasc Nurs 7(1):13-20, 1992.

96. Padilla GV, Grant MM, Ferrell BR, et al: Quality of life scale-cancer, in Spilker B (ed): Quality of Life and Pharmacoeconomics in Clinical Trials. New York, Raven Press, in press.

97. deHaes JCJM, Welvaart K: Quality of life after breast cancer surgery. J Surg Oncol 28:123-125, 1985.

98. de Haes JCJM, van Knippenberg FCE, Neijt JP: Measuring psychological and physical distress in cancer patients: Structure and application of the Rotterdam Symptom Checklist. Br J Cancer 62:1034-1038, 1990.

99. Watson M, Law M, Maguire GP, et al: Further development of a quality of life measure for cancer patients: The Rotterdam Symptom checklist (revised). Psycho-oncology 1:35-44, 1992.

100. Bergner M, Bobbitt RA, Pollard WE: Sickness impact profile: Validation of a health status measure. Med Care 14:57-61, 1976.

101. Bergner M, Bobbitt RA, Carter WB, et al: The sickness impact profile: Development and final revision of a health status measure. Med Care 19:787-806, 1981.

102. Finlay AY, Khan GK, Luscombe D, et al: The sickness impact profile as a measure of health status of noncognitively impaired nursing home residents. Med Care 27:5157-5167, 1989.

103. De Bruin AF, De Witte LP, Diederiks JP: Sickness impact profile: The state of the art of a generic functional status measure. Soc Sci Med 8:1003-1014, 1992.

104. Chwalow AJ, Lurie A, Bean K, et al: A French version of the Sickness Impact Profile (SIP): Stages in the cross validation of a generic quality of life scale. Fundam Clin Pharmacol 6:319-326, 1992.

105. Moinpour CM: Quality of life assessment in Southwest Oncology Group clinical trials: Translating and validating a Spanish questionnaire, in Orley J, Kuyken W (eds): Quality of Life Assessment: International Perspectives, pp 83-97. Berlin, Springer-Verlag, 1994.

106. Moinpour CM, Savage M, Hayden KA, et al: Quality of life assessment in cancer clinical trials, in Dimsdale JE, Baum A (eds): Quality of Life in Behavioral Medicine Research. Hillsdale, NJ, Lawrence Erlbaum Associates, 1993.