Can Machine Learning Help Anticipate Death from Cancer?

A new machine-learning computer model helped predict which patients of cancer may die soon.

Can a machine-learning algorithm detect the intimations of mortality among a group of cancer patients, telling doctors and patients who should be having conversations about the end? A particular machine-learning computer model can do so, based on the factors found in electronic health records (EHRs), according to a new paper by University of Pennsylvania School of Medicine researchers in the journal JAMA Network Open

“In this cohort study, machine learning algorithms based on structured electronic health record data accurately identified patients with cancer at risk of short-term mortality,” they wrote.

Of the patients flagged as being “high priority” by the machine, 51% died within a 180-day window, according to the authors. Less than 4% of the patients deemed “lower priority” died in the same time frame, according to the findings. 

The team used the records of 26,525 adult patients who had outpatient oncology or oncology/hematology encounters in the University of Pennsylvania Health System who were listed in Clarity, an Epic EHR system which includes demographic, comorbidity, and laboratory results, among other data. Among the 26,525 patients, 1,065 (4%) died in the 180-day window. 

The UPenn team has previously instituted a system called Palliative Connect, which is intended to predict patients nearing end of life decisions. But the new machine-learning tool unveiled in the story is intended to focus on oncology, prompting conversations in the outpatient settings. 

For this group of patients, they constructed 3 machine learning models: gradient-boosting, logistic regression, and the random-forest models. The latter, designed to avoid overfitting, is the model that produced the most accurate predictions, according to the results. 

The random forest model produced an observed 180-day mortality of 51.3% (95% CI, 43.6%-58.8%) among those it classified as high-risk. The mortality among the low-risk group in the same time frame was 3.4% (95% CI, 3.0%-3.8%).

At the 500-day mark, those respective percentages increased for the high-risk group (64.4%; 95% CI, 56.7%-71.4%) and the low-risk group (7.6%; 95% CI, 7.0%-8.2%). 

The computer, in that random-forest model, focused in on 10 particular factors: metastatic cancer, particularly the recent count of diagnostic codes; albumin, the last laboratory value; the alkaline phosphatase, the last laboratory value; albumin, the last laboratory value; the patient’s age; alkaline phosphatase, the maximum laboratory value detected; the solid tumor, total count of diagnostic codes; the solid tumor factor as calculated by the recent count of diagnostic codes; metastatic cancer, as determined by the total count of diagnostics codes; and lymphocytes, the minimum laboratory value as determined by percentage. 

“Our process of using machine learning to flag high-risk patients in real time is broadly applicable, and our approach risk-stratifies patients in a usable way that just hasn’t been available to us before,” said Ravi Parikh, MD, the lead author, who is an instructor of medical ethics and health policy at the University of Pennsylvania, and also a staff physician at the Corporal Michael J. Crescenz VA Medical Center. 


“Having an algorithm like this may make doctors in (the) clinic stop and think, ‘Is this the right time to talk about this patient’s preferences?’” added Parikh.