Multi-Stream Representation Learning for Pedestrian Trajectory Prediction
DOI:
https://doi.org/10.1609/aaai.v37i3.25389Keywords:
CV: Video Understanding & Activity Analysis, CV: Vision for Robotics & Autonomous Driving, CV: Motion & TrackingAbstract
Forecasting the future trajectory of pedestrians is an important task in computer vision with a range of applications, from security cameras to autonomous driving. It is very challenging because pedestrians not only move individually across time but also interact spatially, and the spatial and temporal information is deeply coupled with one another in a multi-agent scenario. Learning such complex spatio-temporal correlation is a fundamental issue in pedestrian trajectory prediction. Inspired by the procedure that the hippocampus processes and integrates spatio-temporal information to form memories, we propose a novel multi-stream representation learning module to learn complex spatio-temporal features of pedestrian trajectory. Specifically, we learn temporal, spatial and cross spatio-temporal correlation features in three respective pathways and then adaptively integrate these features with learnable weights by a gated network. Besides, we leverage the sparse attention gate to select informative interactions and correlations brought by complex spatio-temporal modeling and reduce complexity of our model. We evaluate our proposed method on two commonly used datasets, i.e. ETH-UCY and SDD, and the experimental results demonstrate our method achieves the state-of-the-art performance. Code: https://github.com/YuxuanIAIR/MSRL-masterDownloads
Published
2023-06-26
How to Cite
Wu, Y., Wang, L., Zhou, S., Duan, J., Hua, G., & Tang, W. (2023). Multi-Stream Representation Learning for Pedestrian Trajectory Prediction. Proceedings of the AAAI Conference on Artificial Intelligence, 37(3), 2875-2882. https://doi.org/10.1609/aaai.v37i3.25389
Issue
Section
AAAI Technical Track on Computer Vision III