Risk Modeling of Time-Varying Covariates Using an Ensemble of Survival Trees: Predicting Future Cancer Events

Authors

  • Dan Coster Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
  • Eyal Fisher Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge
  • Shani Shenhar-Tsarfaty Departments of Internal Medicine ”C”, ”D” and ”E”, Tel-Aviv Sourasky Medical Center Faculty of Medicine, Tel-Aviv University, Tel-Aviv, Israel
  • Tehillah Menes Department of Surgery C & Surgical Oncology, Chaim Sheba Medical Center, Ramat Gan, Israel Faculty of Medicine, Tel-Aviv University, Tel-Aviv, Israel
  • Shlomo Berliner Departments of Internal Medicine ”C”, ”D” and ”E”, Tel-Aviv Sourasky Medical Center Faculty of Medicine, Tel-Aviv University, Tel-Aviv, Israel
  • Ori Rogowski Departments of Internal Medicine ”C”, ”D” and ”E”, Tel-Aviv Sourasky Medical Center Faculty of Medicine, Tel-Aviv University, Tel-Aviv, Israel
  • David Zeltser Departments of Internal Medicine ”C”, ”D” and ”E”, Tel-Aviv Sourasky Medical Center Faculty of Medicine, Tel-Aviv University, Tel-Aviv, Israel
  • Itzhak Shapira Departments of Internal Medicine ”C”, ”D” and ”E”, Tel-Aviv Sourasky Medical Center Faculty of Medicine, Tel-Aviv University, Tel-Aviv, Israel
  • Eran Halperin Department of Computer Science, University of California, Los Angeles, California, USA Department of Computational Medicine, University of California, Los Angeles, California, USA
  • Saharon Rosset Department of Statistics and Operations Research, Tel-Aviv University, Tel-Aviv, Israel
  • Malka Gorfine Department of Statistics and Operations Research, Tel-Aviv University, Tel-Aviv, Israel
  • Ron Shamir Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel

DOI:

https://doi.org/10.1609/aaaiss.v2i1.27711

Keywords:

Survival Analysis, Risk Prediction, Cancer Screening, Random Forest, Machine Learning, Time-Varying Covariates

Abstract

The challenge of survival prediction is ubiquitous in medicine, but only a handful of methods are available for survival prediction based on time-varying data. Here we propose a novel method for this problem, using a random forest of survival trees for left-truncated and right-censored data. We demonstrate the advantage of our method on prediction of breast cancer and prostate gland cancer risk among healthy individuals by analyzing routine laboratory measurements, vital signs and age. We analyze electronic medical records of 20,317 healthy individuals who underwent routine checkups and identified those who later developed cancer. In cross-validation, our method predicted future prostate and breast cancers six months before diagnosis with an area under the ROC curve of 0.62±0.05 and 0.6±0.03 respectively, outperforming standard random forest, random survival forest, cox-regression model, dynamic deep-hit and a single survival tree. Our work proposes a new framework for survival risk prediction in time-varying data and our results suggest that computational analysis of data on healthy individuals can improve the detection of those at risk of future cancer development.

Downloads

Published

2024-01-22

Issue

Section

Second Symposium on Survival Prediction: Algorithms, Challenges, and Applications (SPACA)