CaTFormer: Causal Temporal Transformer with Dynamic Contextual Fusion for Driving Intention Prediction

Authors

  • Sirui Wang Beijing Jiaotong University
  • Zhou Guan Beijing Jiaotong University
  • Bingxi Zhao Beijing Jiaotong University
  • Tongjia Gu Beijing Jiaotong University
  • Jie Liu Beijing Jiaotong University

DOI:

https://doi.org/10.1609/aaai.v40i12.37977

Abstract

Accurate prediction of driving intention is key to enhancing the safety and interactive efficiency of human-machine co-driving systems. It serves as a cornerstone for achieving high-level autonomous driving. However, current approaches remain inadequate for accurately modeling the complex spatiotemporal interdependencies and the unpredictable variability of human driving behavior. To address these challenges, we propose CaTFormer, a causal Temporal Transformer that explicitly models causal interactions between driver behavior and environmental context for robust intention prediction. Specifically, CaTFormer introduces a novel Reciprocal Delayed Fusion (RDF) mechanism for precise temporal alignment of interior and exterior feature streams, a Counterfactual Residual Encoding (CRE) module that systematically eliminates spurious correlations to reveal authentic causal dependencies, and an innovative Feature Synthesis Network (FSN) that adaptively synthesizes these purified representations into coherent temporal representations. Experimental results demonstrate that CaTFormer attains state-of-the-art performance on the Brain4Cars dataset. It effectively captures complex causal temporal dependencies and enhances both the accuracy and transparency of driving intention prediction.

Downloads

Published

2026-03-14

How to Cite

Wang, S., Guan, Z., Zhao, B., Gu, T., & Liu, J. (2026). CaTFormer: Causal Temporal Transformer with Dynamic Contextual Fusion for Driving Intention Prediction. Proceedings of the AAAI Conference on Artificial Intelligence, 40(12), 10101–10108. https://doi.org/10.1609/aaai.v40i12.37977

Issue

Section

AAAI Technical Track on Computer Vision IX