(P135) Quantitative Assessment of Target Delineation Variability for Thymic Cancers: Agreement Evaluation of a Prospective Segmentation Challenge

April 30, 2015

Expert agreement for definitive-case volumes was exceptionally high, although significantly lower agreement was noted postoperatively. Technique and dose prescription between experts were substantively consistent, and these preliminary results will be utilized to create an expert-consensus contouring atlas to aid the nonexpert radiation oncologist in the planning of these challenging, rare tumors.

Emma B. Holliday, MD, Clifton D. Fuller, MD, PhD, Jayashree Kalpathy-Cramer, PhD, Daniel Gomez, MD, Andreas Rimner, MD, PhD, Ying Li, MD, PhD, Suresh Senan, MD, PhD, Lynn D. Wilson, MD, MPH, Jehee Choi, MD, Ritsuko Komaki, MD, Charles R. Thomas, MD; UT MD Anderson Cancer Center; Martinos Center for Biomedical Imaging Center, Massachusetts General Hospital, Harvard Medical School; Memorial Sloan-Kettering Cancer Center; UT Health Science Center, San Antonio; VU Medical Center; Yale University; Loyola University; Oregon Health and Science University Knight Cancer Institute

PURPOSE: We sought to quantitatively determine the relative interobserver variability of expert target volume delineation as part of a larger standardization effort to develop an expert-consensus contouring atlas to complement radiotherapy recommendations for thymic cancers.

METHODS: A pilot dataset was created, consisting of a standardized case presentation with anonymized pre- and postoperative DICOM computed tomography (CT) image sets from a single patient with Masaoka-Koga stage III thymoma. Participating expert thoracic radiation oncologists delineated tumor targets on the pre- and postoperative scans as they would for a definitive and adjuvant case, respectively. Respondents then completed a survey detailing the dose prescription and planning target volume (PTV) margins that they would recommend in definitive and postoperative (ie, R1 vs R2) scenarios. Interobserver variability was analyzed quantitatively with Warfield’s simultaneous truth and performance-level estimation (STAPLE) algorithm and the Dice similarity coefficient (DSC).

RESULTS: Seven users completed the contouring tasks for definitive and adjuvant cases; of these users, five completed online surveys. Segmentation performance was assessed, with high mean ± SD STAPLE-estimated segmentation sensitivity for definitive-case gross tumor volume (GTV) and clinical target volume (CTV) at 0.77 and 0.80, respectively, and postoperative CTV sensitivity of 0.55; all volumes had a specificity of ≥ 0.99. Interobserver agreement was markedly higher for the definitive-case target volumes, with mean ± SD DSC of 0.88 ± 0.03 and 0.89 ± 0.04 for GTV and CTV, respectively, compared with postoperative CTV DSC of 0.69 ±0 .06 (Kruskal-Wallis: P < .01).

CONCLUSION: Expert agreement for definitive-case volumes was exceptionally high, although significantly lower agreement was noted postoperatively. Technique and dose prescription between experts were substantively consistent, and these preliminary results will be utilized to create an expert-consensus contouring atlas to aid the nonexpert radiation oncologist in the planning of these challenging, rare tumors.

Proceedings of the 97th Annual Meeting of the American Radium Society - americanradiumsociety.org