The success of clinical trials and the relevance of their results to clinical practice depend on the choice of appropriate endpoints for the hypothesis being examined. In this context, endpoints must meet certain requirements. First, an endpoint must be measurable; second, it must be sensitive to the effect of treatment. Biologic activity can be used early in the development of a new agent to assess the potential for patient benefit, but in the longer term, proof of clinical efficacy is required for successful phase II and III clinical trials. Third, endpoints must be clinically relevant. Clinical relevance may be related to the outcome for an individual patient, or more commonly, it is related to a shift in the distribution of outcomes for an entire study population.
Statistical Endpoint Hierarchy
In a phase III trial setting (and in general), an endpoint hierarchy must be considered. The most preferable endpoint for any trial design is a true clinical efficacy measure. Overall survival (OS), historically the "gold standard" of phase III trials in oncology, is a true clinical efficacy measure and remains the most unambiguous and universally agreed upon primary endpoint. However, in most tumors, the use of OS as an endpoint requires considerable time for patient follow-up, a high associated cost for clinical trials, and thus delays the introduction of possibly beneficial therapy to patients. An endpoint that could be assessed more rapidly would more quickly answer/address the trial hypotheses, limit the number of patients required per study, and potentially expedite the drug approval process. Additionally, OS may be insensitive to a true beneficial drug effect when patients receive subsequent lines of therapy with an impact on OS. Thus, identification of alternate but appropriate primary endpoints for assessing the effect of new treatment regimens on patient outcome has been central to the evolution of clinical research in oncology.
These considerations naturally lead to the search for validated surrogate endpoints that accurately predict clinical efficacy. Validated surrogate endpoints are obtained sooner, at less cost, or less invasively than the true endpoint of interest. Validated surrogates are rare, as the formal process of validation is burdensome, time-consuming, and sometimes despite all efforts, simply fails. Current efforts are ongoing to validate progression-free survival (PFS) as an endpoint for palliative phase II and III trials in various tumor entities as it relates to OS. The advantages of a validated surrogate, were it available, would make it an endpoint of choice in the drug development and approval process.
The next lower endpoint in the hierarchy (below a validated surrogate) is a surrogate endpoint that is reasonably likely to predict clinical benefit (but not formally validated). This has been the status of tumor response rate (RR) from the perspective of the US Food and Drug Administration (FDA). Because tumors rarely spontaneously regress, RR is considered a measure of clinical activity. There are sufficient data and literature to suggest that there is some patient benefit to receiving a response, and for this reason, RR has been used as the basis for an accelerated approval process for multiple chemotherapeutic agents in oncology.[2,3] However, in the era of biologic therapies, RR would not be an appropriate endpoint for a trial investigating a purported cytostatic or other novel chemotherapeutic agent that impacts tumor growth by nonconventional mechanisms. Additionally, surrogate endpoints that are reasonably likely to predict clinical benefit are rarely accepted as an endpoint for full drug approval.
At the bottom of the endpoint hierarchy is the great majority of other possible trial endpoints that are solely measures of biologic activity, with unclear clinical relevance. The relationship of these endpoints to patient outcome may not fully be understood, and therefore their use as a measure of clinical efficacy is questionable.
The overall goal associated with the use of a surrogate endpoint is to allow the same inference to be made as if the desired clinical endpoint had been used more quickly, but to allow this inference and at less cost. Various endpoints have historically been used as surrogates: tumor burden outcomes; time to progression (TTP, or progression-free survival [PFS]) and objective response rate (ORR); and biomarkers, such as carcinoembryonic antigen (CEA) in colon cancer, prostate-specific antigen (PSA) in prostate cancer, or CA-125 in ovarian cancer. While the use of any of these endpoints can establish biologic activity, their use as a primary endpoint in a study when it has yet to be understood whether they truly reflect clinical efficacy may result in an outcome of uncertain clinical relevance.
Surrogate endpoints fail for a variety of reasons. As outlined by Fleming and DeMets, a particular surrogate may not be intrinsic or active in the causal pathway of the disease process. The disease process may indeed affect the surrogate endpoint, but if the surrogate is not linked to the true clinical outcome, then the treatment may impact the surrogate endpoint but this impact would not translate into efficacy as measured by the true clinical endpoint (Figure 1A). For example, the tumor marker CEA has been used as a surrogate marker in advanced colon cancer. However, it remains unclear that a treatment that impacts CEA levels will in turn impact clinically relevant outcome measures such as OS.
Surrogate endpoints may also fail if the disease impacts the true clinical outcome via multiple processes, but the surrogate endpoint is contained in only one of those (Figure 1B). The intervention may affect the disease pathway containing the surrogate endpoint, but not other relevant pathways. In this instance, a large effect of the treatment on the surrogate can translate to a small effect on the true clinical outcome, thus the surrogate may suggest a false-positive result.
Alternatively, the surrogate endpoint may either not be in the pathway of the intervention's effect, or may be insensitive to its effect (Figure 1C). The intervention could have no effect on the surrogate endpoint, but still impact the true clinical outcome. This would, in turn, lead to a false-negative result. This scenario is particularly germane to trials investigating cytostatic agents, and it demonstrates why the use of RR as a surrogate endpoint could miss a beneficial effect on clinically relevant patient outcomes such as OS.
Finally, the use of a surrogate endpoint may fail when the intervention has multiple mechanisms of action related to the true clinical outcome, and these mechanisms of action may (or may not) include the surrogate endpoint (Figure 1D). For example, the surrogate endpoint of RR will not detect the impact of a grade 5 adverse event on OS, as in this case the therapy negatively impacts the true clinical outcome independently of the surrogate.