Explainable Survival Analysis with Convolution-Involved Vision Transformer

Authors

  • Yifan Shen Beijing University of Posts and Telecommunications
  • Li Liu National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
  • Zhihao Tang Beijing University of Posts and Telecommunications
  • Zongyi Chen Beijing University of Posts and Telecommunications
  • Guixiang Ma University of Illinois at Chicago
  • Jiyan Dong National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
  • Xi Zhang Beijing University of Posts and Telecommunications
  • Lin Yang National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
  • Qingfeng Zheng National Cancer Center/ National Clinical Research Center for Cancer/ Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China

DOI:

https://doi.org/10.1609/aaai.v36i2.20118

Keywords:

Computer Vision (CV)

Abstract

Image-based survival prediction models can facilitate doctors in diagnosing and treating cancer patients. With the advance of digital pathology technologies, the big whole slide images (WSIs) provide increasing resolution and more details for diagnosis. However, the gigabyte-size WSIs would make most models computationally infeasible. To this end, instead of using the complete WSIs, most of existing models only use a pre-selected subset of key patches or patch clusters as input, which might fail to completely capture the patient's tumor morphology. In this work, we aim to develop a novel survival analysis model to fully utilize the complete WSI information. We show that the use of a Vision Transformer (ViT) backbone, together with convolution operations involved in it, is an effective framework to improve the prediction performance. Additionally, we present a post-hoc explainable method to identify the most salient patches and distinct morphology features, making the model more faithful and the results easier to comprehend by human users. Evaluations on two large cancer datasets show that our proposed model is more effective and has better interpretability for survival prediction.

Downloads

Published

2022-06-28

How to Cite

Shen, Y., Liu, L., Tang, Z., Chen, Z., Ma, G., Dong, J., Zhang, X., Yang, L., & Zheng, Q. (2022). Explainable Survival Analysis with Convolution-Involved Vision Transformer. Proceedings of the AAAI Conference on Artificial Intelligence, 36(2), 2207-2215. https://doi.org/10.1609/aaai.v36i2.20118

Issue

Section

AAAI Technical Track on Computer Vision II