Arshiya Mariam, BS, and colleagues report the findings of a large meta-analysis assessing the ability of various biomarkers to predict responses to immune checkpoint inhibition.
BACKGROUND: Immune checkpoint inhibitors (ICIs) that block PD-1/PD-L1 have consistently demonstrated durable clinical activity across multiple histologies but have low overall response rates for many cancers—indicating that too few patients benefit from ICIs. Many studies have explored potential predictive biomarkers (eg, PD-1/PD-L1 expression, tumor mutational burden [TMB]), no consensus biomarker has been identified.
METHODS: This meta-analysis combined predictive accuracy metrics for various biomarkers, across multiple cancer types, to determine which biomarkers are most accurate for predicting ICI response. Data from 18,792 patients from 100 peer-reviewed studies that evaluated putative biomarkers for response to anti–PD-1/anti- PD-L1 treatment were meta-analyzed using bivariate linear mixed models. Biomarker performance was assessed based on the global area under the receiver operating characteristic curve (AUC) and 95% bootstrap confidence intervals.
RESULTS: PD-L1 immunohistochemistry, TMB, and multimodal biomarkers discriminated responders and nonresponders better than random assignment (AUCs >.50). Excluding multimodal biomarkers, these biomarkers correctly classified at least 50% of the responders (sensitivity 95% CIs, >.50). Notably, variation in biomarker performance was observed across cancer types.
CONCLUSIONS: Although some biomarkers consistently performed better, heterogeneity in performance was observed across cancer types, and additional research is needed to identify highly accurate and precise biomarkers for widespread clinical use.
Immune checkpoint inhibitors (ICIs) are becoming a cornerstone of cancer therapy across multiple histologies.1,2 ICIs that block PD-1 or PD-L1 are at the forefront of ICI clinical implementation. These therapies reactivate the immune response to tumor cells by inhibiting the interaction of PD-L1 and PD-1, and multiple studies have demonstrated their clinical benefit over standard treatments.3-7 Although ICIs show evidence of durable clinical benefit for individuals who respond, the objective response rate (ORR) to anti–PD-1/anti–PD-L1 therapies is approximately 24% (95% CI, 21%- 28%).2 Approximately 16% (95% CI, 12%-21%) of patients also experience significant toxicity, including colitis and endocrine organ dysfunction.2 It is critical that biomarkers for ICIs are robustly predictive to better guide clinical decision-making.
Many studies have explored whether PD-L1 or PD-1 protein expression,8-11 tumor mutational burden (TMB),12-16 and, more recently, immune-mediated adverse events17 (imAEs) and the microbiome signature,18-20 can discriminate between responders and nonresponders to anti–PD-1/anti–PD-L1 immunotherapies. The results from these studies are often inconsistent. For example, Bellmunt et al reported that a PD-L1 expression threshold above 10% discriminated against patients with urothelial bladder cancer,21 whereas Massard et al reported a threshold of 25% for the same cancer type.22 Differences in patient populations, sample collection and processing, technology platforms, biomarker thresholds, and the specific ICI used all contribute to high variability across studies. In addition to methodological differences, many studies also have limited sample sizes that may impact statistical power for discovering biomarkers. Although most reviews qualitatively condense information across studies, biomarker performances are not always summarized in a quantitative manner.23,24 Meta-analysis is an approach to developing consensus important clinical questions from previously published literature, and it provides an opportunity to obtain relevant statistical summaries for potential ICI biomarkers.2 An additional benefit of meta-analyses is that biomarkers can be concurrently evaluated across different treatments, threshold values, and cancer types. Here, we conducted the largest meta-analysis of predictive biomarkers for ICI therapy to date, including 100 peer-reviewed studies with data from 18,792 patients. We also investigated whether some emerging biomarkers, such as the microbiome signature or imAEs, show promise for clinical utility. Furthermore, we implemented a robust statistical approach that went beyond reporting which biomarkers displayed the highest predictive accuracy. The objective of this study is to provide a comprehensive evaluation of the current state of predictive utility for the most common biomarkers, and some emerging ones, for ICI treatment response.
Literature Search and Inclusion Criteria
PubMed and Google Scholar were searched for peer-reviewed manuscripts and conference abstracts focused on anti–PD-1/anti–PD-L1 therapies and biomarkers. Keywords used to search included: “anti–PD-1/anti–PD-L1 therapies and tumor mutational burden,” “anti–PD-1/anti–PD-L1 therapies and AEs,” “anti–PD-1/anti–PD-L1 therapies and biomarkers,” and “biomarkers for immune checkpoint inhibitors.” Studies were selected based on the availability of summary-level or patient-level data on clinical outcomes and predictive biomarkers. PRISMA 2020 checklist detailing the quality assessment for including studies in the meta-analysis is provided in Supplementary File 1.
For each study, the title, publication year, treatment, type of cancer, biomarker, and clinical outcome details were documented by 3 separate reviewers (Supplementary File 2). Any discrepancies in collected data were reviewed by all reviewers and reconciled by consensus. ORR was considered the primary clinical outcome, and clinical benefit (CB) was used if ORR was not available. Responses were determined using RECIST, immune-related response criteria, or modified RECIST3,25 by investigator assessment or independent review. If a response was evaluated using multiple tumor criteria or by multiple assessors, the means of data were rounded to the nearest integer. The thresholds for biomarker activity were accepted as defined in each study.
The following metrics for biomarker performance were calculated: sensitivity, specificity, false positive rate, and false negative rate (Supplementary Table 1). Each of these metrics can be calculated from a 2 × 2 contingency table, where counts of individuals meeting the criterion for having a positive or negative result for a biomarker and having a positive or negative result for the clinical outcome can be tabulated. Only studies that provided either individual counts for each cell in the 2 × 2 table or the necessary individual-level information to complete the 2 × 2 table were included. Studies that did not propose a threshold or cutoff value for the biomarker were excluded unless participant-level data were available from which a 2 × 2 table could be developed.
Across all included studies, 9 classes of biomarkers were investigated (Supplementary Table 2). The 3 most frequently observed biomarkers were PD-L1 protein expression, TMB, and multimodal biomarkers. Interest in AEs and the microbiome signature as potential biomarkers has emerged more recently. Specific details regarding each of the 9 biomarker classes are described below.
PD-L1 protein expression. PD-L1 protein expression measured on tumor cells, immune cells, or both, was included. Each study provided an expression threshold that was used to evaluate observed clinical responses. Patients with PD-L1 expression greater than the threshold were expected to be more likely to respond to treatment. PD-L1 expression was further divided into (1) immunohistochemistry (IHC) and (2) multiplex immunohistochemistry/ immunofluorescence (mIHC/IF) assays.
TMB. TMB refers to the number of somatic DNA mutations across the tumor genome. Since the early TMB studies, many variations of this biomarker have been studied. TMB has been quantified based on nonsynonymous single nucleotide variants,26,27 frameshift mutations,28 and circulating tumor DNA,29 and studies calculating TMB from whole exome or whole genome sequencing were included. Median TMB was a commonly reported threshold for assessing response to ICIs. The TMB threshold defined by the authors of each study was used for this analysis, except for that of Hugo et al,27 which did not report a threshold; here, the authors used the median TMB. For all studies, TMB was evaluated to determine if being above the threshold was indicative of an increased likelihood of response to treatment.
T cell–related gene signatures (TGSs). Four studies evaluated sets of gene expression for association with response to treatment. Wang et al developed an epithelial-mesenchymal transition–related gene expression correlated with T-cell infiltration and predictive of response to treatment.30 Other gene expressions related to T-cell inflammation were calculated from total RNA and mRNA. PD-L1 and CXCL9 were commonly included genes.31,32
CD8+. CD8+ tumor-infiltrating lymphocytes are involved in the immune response to the tumor and have been linked to improved overall survival in esophageal cancer33 and urothelial cancer.30 Of the 3 studies included in the analysis, results from 2 studies reported improved ORRs and prolonged overall survival with higher CD8+ infiltration.30,33
Microbiome signature. Three studies investigated the relationship between microbiome signature and ICI response. Individuals with gut and oral commensal microbiome signatures that promote antitumor immunity have been shown to benefit more from ICI treatments than others.20,34 Conversely, downregulation of these microbiome signatures by antibiotics has been linked to worse treatment responses.18 Commensal bacterial species implicated in response included Akkermansia muciniphila, Bifidobacterium longum, Collinsella aerofaciens, and Enterococcus faecium.20 The predictive thresholds established by the authors for these studies were utilized for this analysis.
AEs of special interest and imAEs. Unlike other biomarkers, which are assessed prior to treatment initiation, adverse events of special interest (AESIs) and imAEs are observed after the administration of ICIs but prior to the determination of clinical response. AESIs comprised a variety of events including autoimmune events, rash, and diarrhea. imAEs were defined as AESIs that required treatment with systemic or topical corticosteroids.17 These data were previously reported in Maher et al, which combined data from 7 trials submitted to the FDA,17 and we previously reported the discriminatory potential for these biomarkers.35 AESIs and imAEs were evaluated to determine if their occurrence was indicative of an increased likelihood of response to treatment.
Multimodal biomarkers. The discriminatory potential of biomarker combinations has been investigated in a few studies that collectively investigated 3 cancer types: melanoma, non–small cell lung cancer (NSCLC), and head and neck cancer.31,36 The following combinations of multimodal biomarkers are presented here: (1) TMB and PD-L1 IHC (4 studies), (2) TMB and TGS (1 study), and (3) PD-L1 IHC and PD-1 IHC (1 study).
International Metastatic RCC Database Consortium (IMDC) risk score. This scoring method is used to predict prognosis and recommend first-line therapies for patients with renal cell carcinoma (RCC) only.37,38 Disease is categorized as favorable, intermediate, and poor risk based on the presence of 0, 1 to 2, and 3 or more risk factors, respectively.
Biomarker performance metrics (Supplementary Table 1) were calculated for each study, and various groups were meta-analyzed for comparison using the R package, mada.39 Meta-analyses were conducted to determine (1) discriminatory potential for each biomarker across multiple cancer types, and (2) discriminatory potential for each biomarker for each cancer type. Binary test outcomes, such as sensitivity and specificity, rely on a threshold for determining the optimal test performance. This threshold often creates a tradeoff between certain values, and simply averaging values across studies with different thresholds can confound results.40 To address this, we implemented the summary-receiving operating characteristic curve approach,41,42 which performs bivariate analyses using a linear mixed effects model. We separately evaluated specificity and sensitivity. A minimum of 3 studies, or 500 patients, were required to perform each meta-analysis. For biomarkers that did not meet this inclusion criterion, the results of the individuals are described for context, but they were not meta-analyzed. If a study reported multiple thresholds for the same biomarker, only the results of the threshold with the greatest balance accuracy were included in the meta-analysis. The area under the curve (AUC) estimate was calculated from the extrapolated bivariate models. CIs for AUCs were estimated based on 10,000 bootstrap iterations.43
After performing quality control, 100 of 197 studies published from 2010 to 2021 met the inclusion criteria. ORR and CB were reported in 85% and 8% of studies, respectively. The descriptive statistics for the studies are provided in Supplementary Figure 1. The most frequent cancer types in the data set were NSCLC (29.5%) and melanoma (22.1%) (Supplementary Table 2). The most frequently investigated biomarker was PD-L1 expression (76%; Supplementary Table 3). Below, we present the overall characterization of each biomarker followed by the meta-analysis results by cancer type. The meta-analysis results across all cancer types and other analyses are shown in Supplementary Tables 4 and 5. All of the included studies are listed in Supplementary Table 6.
Overall Biomarker Performance
Of 9 defined classes of biomarkers, 6 met the criteria for the number of studies or samples to be meta-analyzed. AESIs/imAEs, microbiome signature, and IMDC were not meta-analyzed and were considered separately because they were investigated in 1, 2, and 2 studies, respectively. Three biomarkers—TMB, PD-L1 IHC, and mIHC/IF—correctly classified at least 50% of the responders (sensitivity, 95% CIs >0.50) (Figure 1). Sensitivities for PD-L1 IHC (n = 76) and TMB (n = 15) were estimated to be 0.60 (95% CI, 0.55-0.64) and 0.59 (95% CI, 0.52-0.66), respectively. mIHC/IF was the most sensitive (sensitivity, 0.75; 95% CI, 0.53-0.89); however, it has been investigated only in 3 studies. mIHC/IF (AUC, 0.71; 95% CI, 0.63-0.83) closely followed by TMB (AUC, 0.68; 95% CI, 0.64-0.72) had the highest AUCs. PD-L1 IHC discriminated marginally better than random assignment, with an AUC of 0.63 (95% CI, 0.61-0.65).
PD-L1 IHC (29 cohorts), TMB (14 cohorts), and multimodal biomarkers (3 cohorts) were meta-analyzed. PD-L1 IHC was the most sensitive (sensitivity, 0.63; 95% CI, 0.57-0.68) and demonstrated moderate specificity (specificity, 0.63; 95% CI, 0.55-0.70). The sensitivity of TMB was slightly lower and varied more across studies (sensitivity, 0.60; 95% CI, 0.47-0.70). However, both PD-L1 IHC and TMB were consistently accurate in their classification of patients (AUCs >0.50) (Figure 2).
PD-L1 IHC (15 cohorts), TMB (6 cohorts), mIHC/IF (5 cohorts), and multimodal biomarkers (4 cohorts) were meta-analyzed. Unlike NSCLC, TMB was more sensitive in melanoma (sensitivity, 0.73; 95% CI, 0.64-0.81) than PD-L1 IHC (sensitivity, 0.58; 95% CI, 0.44-0.71). It was also most accurate in classifying both responders and nonresponders (AUC, 0.74; 95% CI, 0.61-0.88). Multimodal biomarkers had moderate overall accuracy (AUC, 0.65; 95% CI, 0.63-0.74) and accurately discriminated more than 50% of responders and nonresponders. The sensitivity of mIHC/IF was similar to that of PD-L1 IHC (sensitivity, 0.62; 95% CI, 0.43-0.77). The microbiome signature was investigated only in melanoma. Gut microbiome signature and buccal microbiome signature for response were investigated in 2 studies and 1 study, respectively. Their sensitivities were low (range, 0.22-0.40) and specificities were high (range, 0.67-1.00).
Only PD-L1 IHC had sufficient studies to perform meta-analysis (9 cohorts). TGS and AESIs/imAEs were both investigated in only a single cohort. PD-L1 IHC was marginally better at discriminating responders and nonresponders than random assignment (AUC, 0.68; 95% CI, 0.63-0.71) (Figure 2), with a sensitivity of 0.53 (95% CI, 0.35- 0.70) and specificity of 0.70 (95% CI, 0.61-0.77). AESIs/imAEs demonstrated poor sensitivity (0.36). TGS performed poorly at discriminating responders and nonresponders (sensitivity, 0.51; specificity, 0.50).
Head and neck cancer
PD-L1 IHC was the only biomarker examined in a sufficient quantity of studies to perform a meta-analysis (8 cohorts). It detected responders with a sensitivity of 0.65 (95% CI, 0.50-0.77). The discriminatory ability of PD-L1 IHC was similar between this cancer type and others investigated (AUC, 0.61; 95% CI, 0.57-0.67). Multimodal biomarkers and TMB were each investigated in 2 cohorts. Multimodal biomarkers were consistently sensitive to at least 50% of the responders and nonresponders (sensitivities, >50%; specificities, >50%) (Figure 2).
PD-L1 IHC (7 cohorts) and TMB (6 cohorts) were meta-analyzed. Each of these biomarkers had a similar discriminatory ability; however, PD-L1 IHC was more sensitive in response prediction (0.60 vs 0.49) (Figure 2). The AUC estimates for PD-L1 IHC and TMB were 0.59 (95% CI, 0.57-0.65) and 0.59 (95% CI, 0.55-0.64), respectively (Figure 2). IMDC was investigated only in RCC, and sensitivities of favorable scores ranged between 0.28 and 0.30, with moderate to high specificities between 0.57 and 0.86 (Supplementary File 2). Sensitivity improved from 0.28 to 0.90 and specificity decreased from 0.86 to 0.24 when intermediate was used as the threshold instead of favorable.38
ICIs targeting PD-1/PD-L1 have resulted in breakthrough treatments for a multitude of cancers, and the impact this class of drugs has on cancer treatment cannot be overstated. Despite the successes, however, only 24% (95% CI, 21%-28%) of patients respond to these treatments.2 A variety of biomarkers have been considered, but no consensus exists regarding which of these biomarkers is capable of or has the potential to be clinically useful. This broad-based meta-analysis addresses the unmet need of characterizing commonly considered biomarkers for ICI treatment response in various cancer types. Most recently, Lu et al44 provided a characterization of biomarker performance, and we have expanded the scope of this investigation in several ways. We included more studies (100 vs 46) and, consequently, more patients (18,792 vs 8135). We also expanded the range of included biomarkers, including novel biomarkers (eg, microbiome signature and imAEs/AESIs). Methodological differences included our implementation of bivariate linear mixed models, which have been shown to provide more accurate estimates compared with estimating sensitivity and specificity separately.44 Clinical response to ICI was defined to improve consistency across studies and consisted only of ORR, CB, and PFS (6 months).
PD-L1 IHC, TMB, and mIHC/IF were moderately sensitive to ICI response when summarized across all investigated cancer types. Consistent with Lu et al,44 these biomarkers also had better discriminatory ability than random assignment (AUCs >0.50). Overall, TMB had better discriminatory ability than PD-L1 expression (Figure 1). Other studies have also reported that TMB better predicted response to ICI than PD-L1 IHC.45 Although relatively few studies investigating mIHC/IF and multimodal biomarkers have been performed, our results and those presented by Lu et al44 demonstrated that both of these biomarkers show promise that warrants additional investigation. Because mIHC/IF has been investigated only in 2 cancer types (melanoma and Merkel cell carcinoma), its performance in other cancer types is yet to be determined.
We also investigated biomarker performance across the 5 most common cancer types evaluated. PD-L1 IHC, TMB, and multimodal biomarkers were the only biomarkers meta-analyzed in more than 1 cancer type. Zhang et al reported greater response in PD-L1–positive subgroups compared with PD-L1 negative subgroups in melanoma, NSCLC, and RCC.2 Better ORR, albeit to a smaller degree, was also observed when multiple cancer types were analyzed together.2 In addition to PD-L1 IHC, TMB also discriminated responders and nonresponders better than a random assignment in these cancer types as well as across all cancers (AUCs > 0.50) (Figure 2). PD-L1 IHC was the only consistently sensitive biomarker in NSCLC and RCC subgroups. On the other hand, TMB and multimodal biomarkers were consistently sensitive in the melanoma subgroup (Figure 2). Multimodal biomarkers were investigated in 3 and 4 cohorts in NSCLC and melanoma, respectively. Its discriminatory ability was more consistent across studies of melanoma (AUC, 0.65; 95% CI, 0.63- 0.74) compared with studies of NSCLC (AUC, 0.71; 95% CI, 0.50-0.81). While meta-analytical summaries of results account for heterogeneity among studies,41 given the small number of studies, additional research is needed to rule out factors other than cancer type contributing to heterogeneity. In the case of PD-L1 IHC alone, tumor type, observing pathologist, assay type, and nonuniform evaluation of tumor microenvironments were reported to impact the efficacy of PD-L1 IHC.44 It is also important to note that the thresholds used are inconsistent across studies and may not align with those currently used in clinical practice. For example, in many contexts, TMB of more than 10 mut/Mb is the approved FDA metric; however, thresholds as low as 6 mut/Mb and as high as 248 mut/Mb have been used in these studies. These studies often do not report individual-level TMB, eliminating the possibility of deriving alternative thresholds for analysis. This will be an important line of investigation in future studies, to determine an optimal threshold for TMB biomarker performance.
AEs and the microbiome signature have recently emerged as potential biomarkers. imAEs/AESIs are distinctive because they are ascertained after treatment initiation, limiting their potential use as pretreatment biomarkers. However, if determined to be effective, they could still serve as leading indicators of response, providing opportunities to modify or enhance treatment. To our knowledge, AEs, and responses to ICIs have been explored only in patients with urothelial cancer.17 AEs and the microbiome signature were found to have low sensitivities (<0.40) for detecting responding individuals. Defining these biomarkers with a different criterion in a different cancer type, or using these in conjunction with an imprecise biomarker, may lead to improved discrimination. Matson et al reported high sensitivity and specificity using a microbiome signature in multiple cancers (Supplementary File 2). An important metric, not reported here, is the positive predictive value (PPV). PPV is a measure of the probability of the outcome given a positive biomarker result. However, PPV is influenced by the prevalence of responders and is therefore highly dependent on tumor type and many other factors. It will be important in follow-up studies that investigate specific use cases of these biomarkers to consider PPV. The results we have presented here also justify the investigation of these biomarkers in other cancer types and potentially in response prediction for other ICIs. The results of these constituent studies should be prospectively validated in an independent cohort for prediction in the future.
mIHC/IF, multimodal biomarkers, TMB, and PD-L1 IHC adequately captured responders and nonresponders across all included cancer types. Between the 2 most frequently investigated biomarkers, TMB outperformed PD-L1 IHC when all cancers were combined. These 2 also adequately captured responders and nonresponders across NSCLC and melanoma. The results for multimodal biomarkers were mixed in NSCLC; however, multimodal biomarkers captured responders and nonresponders similarly to other biomarkers within melanoma and across all cancers. The performance of the biomarkers varies greatly among studies despite accounting for cancer type, and additional work will be needed to optimize these biomarkers.
Funding: DMR and AM were supported in part by the Clinical and Translational Science Collaborative of Cleveland (KL2TR002547) from the National Center for Advancing Translational Sciences component of the National Institutes of Health.
Disclosures: DMR has stock and other ownership interests in Clarified Precision Medicine. He has served in a consultant and advisory role for Pharmazam. He has received research funding from Novo Nordisk and has intellectual property related to the detection of liver cancer. HLM has stock and other ownership interests in Cancer Genetics and Clarified Precision Medicine, and he serves as a consultant or in an advisory role for Admera Health, Cancer Genetics, eviCore Healthcare, Gentris, National Institutes of Health/National Cancer Institute, Pharmazam, Saladax Biomedical, and VieCure.
Availability of Data: The studies underlying the meta-analyses in this article are available in its online supplemental material.
Prior Presentations: Apreliminary version of this analysis was made available on medRxiv (https://doi.org/ 10.1101/2020.11.25.20238865)
Arshiya Mariam, BS1; Suneel Kamath, MD2; Kimberly Schveder, MS3; Howard L. McLeod, PharmD4; Daniel M. Rotroff, PhD5
1Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH; Center for Quantitative Metabolic Research, Cleveland Clinic, Cleveland, OH; email@example.com
2Taussig Cancer Institute, Cleveland Clinic, Cleveland, OH; firstname.lastname@example.org
3Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH; email@example.com
4Intermountain Precision Genomics, Intermountain Healthcare, St George, UT; firstname.lastname@example.org
5Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, OH, USA; Center for Quantitative Metabolic Research, Cleveland Clinic, Cleveland, OH; Cleveland Clinic Lerner College of Medicine, Cleveland, OH; Endocrinology and Metabolism Institute, Cleveland, OH; email@example.com
Daniel M. Rotroff, PhD, MSPH
Department of Quantitative Health Sciences
Lerner Research Institute
9500 Euclid Avenue
Cleveland, OH 44195