scout
Commentary|Articles|April 12, 2026

Closing the Trial Access Gap: Neuro-Symbolic AI and Oncology Trial Matching

Fact checked by: Russ Conroy

A prospective evaluation shows that using large language models within an oncology-specific knowledge graph may help close oncology's enrollment crisis.

Despite decades of investment in clinical trial infrastructure, oncology continues to face a fundamental bottleneck: patient enrollment. Fewer than 5% of adult patients with cancer participate in clinical trials—not due to a lack of available studies, but because matching complex patient profiles to eligibility-dense protocols remains labor intensive, error prone, and operationally unsustainable at scale.1,2 Manual screening requires approximately 120 minutes per patient, involving exhaustive chart review across fragmented records that encode eligibility criteria spanning genomics, temporal constraints, and organ function parameters.1 These challenges are consistently cited by oncology providers as major structural barriers to enrollment, particularly in community settings where research infrastructure is limited.4

Against this backdrop, Loaiza-Bonilla and colleagues present a prospective, 12-month evaluation of a neuro-symbolic, multi-agent artificial intelligence (AI) platform designed to automate and accelerate clinical trial matching across 3804 consecutive patients with metastatic or progressive malignancies.5 Published in ESMO Real World Data and Digital Oncology, this study represents one of the most comprehensive real-world assessments of AI-assisted eligibility screening to date—and, importantly, reframes how such systems should be architected, validated, and deployed in oncology practice.

What Was Built and Why It Matters

The platform described here departs meaningfully from prior AI-based approaches, and that distinction is critical. Most large language model (LLM)–based systems attempt to infer eligibility directly from free-text clinical records using zero-shot or chain-of-thought (CoT) prompting.6,7 While these models demonstrate strong capability in extracting clinical information from unstructured narratives, they remain inherently probabilistic—and therefore unreliable when tasked with executing the strict Boolean, temporal, and exception-based logic that governs trial eligibility.

The TrialGPT framework demonstrated that zero-shot LLM matching could achieve criterion-level accuracy approaching 87% in curated datasets and reduce clinician workload.7 Similarly, Stanford-based efforts showed promising results using retrieval-augmented pipelines with GPT-4.6 However, both approaches lacked a deterministic, auditable reasoning layer—an essential requirement for safety-critical decision-making in oncology.

The neuro-symbolic system introduced by the authors addresses this limitation through architectural separation. It combines 4 core components: OncoAgents (domain-tuned LLMs for extraction and normalization), OncoGraph (a structured oncology knowledge graph encoding eligibility logic), OncoRecommend (a prioritization engine), and OncoSet (an expert-curated dataset of annotated clinical records).5 Crucially, LLM outputs are not used directly for trial matching. Instead, extracted features must be mapped to canonical oncology concepts and validated against graph-based constraints before any recommendation is generated.

This design reflects a fundamental principle articulated in Moravec’s Paradox: while LLMs excel at interpreting complex, context-rich text, they are poorly suited for executing precise logical rules.3 By externalizing reasoning into a deterministic knowledge graph, the system mitigates hallucination risk and aligns with emerging best practices in clinical AI, including those outlined in the TRIPOD-LLM framework.8

Performance at Scale

The system was evaluated across 157,367 clinical document pages—approximately 86.5 million tokens—capturing the heterogeneity of real-world oncology documentation, including structured notes, scanned PDFs, and faxed pathology reports. The primary end point was accuracy at the patient–trial level, benchmarked against a dual-oncologist gold standard (Cohen’s κ = 0.92).

The full neuro-symbolic system achieved an F1 score of 0.82 (95% CI, 0.81–0.83), with balanced sensitivity and specificity of approximately 0.84.5 In contrast, GPT-4 zero-shot prompting achieved an F1 of 0.47, CoT prompting improved performance to 0.67, and GPT-4o reached 0.47. The progression across these models mirrors the incremental addition of structure: unconstrained inference performs poorly; guided reasoning improves outcomes; and deterministic validation delivers the highest reliability.

Ablation analyses further reinforce this conclusion. Removing knowledge graph grounding reduced F1 from 0.82 to 0.79, while eliminating multi-agent decomposition reduced performance to 0.78.5 These findings suggest that the primary performance gains stem from converting unstructured clinical data into canonical representations and enforcing eligibility logic through deterministic execution.

Operationally, the system screened 23,912 patient–trial pairs and identified 17,912 oncologist-confirmed matches. Median screening time decreased from approximately 120 minutes to 30 minutes per patient—split between automated processing and clinician verification—with a median time-to-recommendation of under 7 days. These gains are not incremental; they represent a meaningful reallocation of clinical effort away from administrative burden toward patient-facing care.

Equity as a Co-Primary Signal

The study’s fairness analysis warrants particular attention. No demographic subgroup exceeded the pre-specified 10-point F1 disparity threshold. The largest observed gap—approximately 7 points—occurred between white and Black or African American patients.5 Importantly, this disparity was attributed primarily to upstream documentation limitations, including fragmented care histories and missing structured data, rather than model bias.

This distinction is critical. As demonstrated by Pittell et al., disparities in trial participation reflect systemic inequities in access, documentation, and referral patterns.9 Similarly, ASCO and the Association of Community Cancer Centers emphasize that improving diversity requires structural interventions beyond algorithmic optimization.10

In this context, the system’s conservative design—flagging incomplete data rather than inferring eligibility—may paradoxically amplify disparities rooted in documentation gaps. Addressing this will require parallel investment in data completeness and interoperability, particularly for historically underserved populations.

Error Modes and Remaining Gaps

The reported error taxonomy is both granular and clinically intuitive. Temporal reasoning accounted for 32% of errors, reflecting challenges in interpreting approximate or inconsistently documented timelines. Missing structured data contributed 28%, often leading to false negatives when eligibility criteria were implied but not explicitly recorded. Additional error sources included ambiguity and negation (18%), OCR artifacts (12%), and ontology mismatches (10%).5

These are not failures of model capability alone—they are reflections of real-world documentation complexity. Free-text ambiguity, inconsistent terminology, and fragmented longitudinal records are endemic to oncology practice. While improvements in ontology mapping, OCR fidelity, and knowledge graph expansion will mitigate some of these issues, they underscore the necessity of maintaining human oversight in deployment.

From Matching to Enrollment

A key limitation of this study is its focus on matching accuracy rather than downstream enrollment outcomes. Identifying eligible patients is necessary but not sufficient. Barriers related to logistics, socioeconomic factors, and patient consent persist beyond the matching stage. Whether improved matching translates into increased enrollment remains an open question.

Additionally, the maintenance burden of the knowledge graph is nontrivial. The reported requirement of approximately 20 hours of curator time and 4 hours of oncologist oversight per month represents a meaningful operational investment.5 While justified as the “price of safety,” this infrastructure must be considered in real-world implementation.

From a regulatory standpoint, the system is appropriately positioned as clinician-facing decision support rather than autonomous decision-making. Approximately 25% of cases are triaged for human review, and all outputs include transparent, evidence-linked rationales. This aligns with emerging guidance from CONSORT-AI, SPIRIT-AI, and TRIPOD-LLM emphasizing transparency and human oversight in clinical AI systems.8

Looking Forward

The next phase of evaluation must focus on prospective, multicenter studies with enrollment as a primary endpoint. Key questions include how performance generalizes across diverse EHR systems, how equity gaps evolve with site-specific calibration, and whether efficiency gains translate into measurable improvements in trial participation—particularly among underrepresented populations.9,10

The architecture described here—combining LLM-based extraction with deterministic knowledge graph reasoning and clinician oversight—represents a thoughtful and pragmatic framework for deploying AI in safety-critical oncology workflows. Whether it will scale across the full complexity of real-world practice remains to be seen.

However, the evidence presented across 3804 patients and nearly 24,000 patient–trial pairs suggests that this approach is not only viable—but necessary. If oncology is to meaningfully close its enrollment gap, solutions must move beyond incremental efficiency gains toward systems that are scalable, auditable, and equity-aware by design.

References

  1. Monreal I, Chappell H, Kiss R, et al. Understanding the barriers to clinical trial referral and enrollment among oncology providers within the Veterans Health Administration. Mil Med. 2025;190(3-4):e891-e898. doi:10.1093/milmed/usae441.
  2. Eldridge L, Goodman NR, Chtourou A, et al. Barriers and opportunities for cancer clinical trials in low- and middle-income countries. JAMA Netw Open. 2025;8(4):e257733. doi:10.1001/jamanetworkopen.2025.7733.
  3. Loaiza-Bonilla A, Penberthy S. Harnessing Moravec's Paradox in health care: a new era of collaborative intelligence. NEJM AI. 2025;2(5):AIp2500005. doi:10.1056/AIp2500005.
  4. Unger JM. A ground's-eye view on racial and ethnic disparities in cancer clinical trial participation. JAMA Netw Open. 2023;6(7):e2322436. doi:10.1001/jamanetworkopen.2023.22436.
  5. Loaiza-Bonilla A, Yost C, Kurnaz S, et al. Transforming oncology clinical trial matching through neuro-symbolic, multi-agent AI and an oncology-specific knowledge graph: a prospective evaluation in 3804 patients. ESMO Real World Data Digit Oncol. 2026;12. doi:10.1016/j.esmorw.2026.100706.
  6. Wornow M, Lozano A, Dash D, et al. Zero-shot clinical trial patient matching with LLMs. NEJM AI. 2025;2(1):AIcs2400360. doi:10.1056/AIcs2400360.
  7. Jin Q, Wang Z, Floudas CS, et al. Matching patients to clinical trials with large language models. Nat Commun. 2024;15:9074. doi:10.1038/s41467-024-53081-z.
  8. Gallifant J, Afshar M, Ameen S, et al. The TRIPOD-LLM reporting guideline for studies using large language models. Nat Med. 2025;31:60-69. doi:10.1038/s41591-024-03425-5.
  9. Pittell H, Calip GS, Pierre A, et al. Racial and ethnic inequities in US oncology clinical trial participation from 2017 to 2022. JAMA Netw Open. 2023;6(7):e2322515. doi:10.1001/jamanetworkopen.2023.22515.
  10. Oyer RA, Hurley P, Boehmer L, et al. Increasing racial and ethnic diversity in cancer clinical trials: an American Society of Clinical Oncology and Association of Community Cancer Centers joint research statement. J Clin Oncol. 2022;40(19):2163-2171. doi:10.1200/JCO.22.00754.

Newsletter

Stay up to date on recent advances in the multidisciplinary approach to cancer.


Latest CME