Multi-Stream Representation Learning for Pedestrian Trajectory Prediction

Authors

  • Yuxuan Wu Xi'an Jiaotong University
  • Le Wang Xi'an Jiaotong University
  • Sanping Zhou Xi'an Jiaotong University
  • Jinghai Duan Xi'an Jiaotong University
  • Gang Hua Wormpex AI Research
  • Wei Tang University of Illinois at Chicago

DOI:

https://doi.org/10.1609/aaai.v37i3.25389

Keywords:

CV: Video Understanding & Activity Analysis, CV: Vision for Robotics & Autonomous Driving, CV: Motion & Tracking

Abstract

Forecasting the future trajectory of pedestrians is an important task in computer vision with a range of applications, from security cameras to autonomous driving. It is very challenging because pedestrians not only move individually across time but also interact spatially, and the spatial and temporal information is deeply coupled with one another in a multi-agent scenario. Learning such complex spatio-temporal correlation is a fundamental issue in pedestrian trajectory prediction. Inspired by the procedure that the hippocampus processes and integrates spatio-temporal information to form memories, we propose a novel multi-stream representation learning module to learn complex spatio-temporal features of pedestrian trajectory. Specifically, we learn temporal, spatial and cross spatio-temporal correlation features in three respective pathways and then adaptively integrate these features with learnable weights by a gated network. Besides, we leverage the sparse attention gate to select informative interactions and correlations brought by complex spatio-temporal modeling and reduce complexity of our model. We evaluate our proposed method on two commonly used datasets, i.e. ETH-UCY and SDD, and the experimental results demonstrate our method achieves the state-of-the-art performance. Code: https://github.com/YuxuanIAIR/MSRL-master

Downloads

Published

2023-06-26

How to Cite

Wu, Y., Wang, L., Zhou, S., Duan, J., Hua, G., & Tang, W. (2023). Multi-Stream Representation Learning for Pedestrian Trajectory Prediction. Proceedings of the AAAI Conference on Artificial Intelligence, 37(3), 2875-2882. https://doi.org/10.1609/aaai.v37i3.25389

Issue

Section

AAAI Technical Track on Computer Vision III