Self-Supervised Bird’s Eye View Motion Prediction with Cross-Modality Signals

Authors

  • Shaoheng Fang Shanghai Jiao Tong University
  • Zuhong Liu Shanghai JIaoTong University
  • Mingyu Wang University of Chinese Academy of Sciences
  • Chenxin Xu Shanghai Jiao Tong University
  • Yiqi Zhong University of Southern California
  • Siheng Chen Shanghai Jiao Tong University Shanghai AI Laboratory

DOI:

https://doi.org/10.1609/aaai.v38i2.27940

Keywords:

CV: Vision for Robotics & Autonomous Driving, ML: Unsupervised & Self-Supervised Learning

Abstract

Learning the dense bird's eye view (BEV) motion flow in a self-supervised manner is an emerging research for robotics and autonomous driving. Current self-supervised methods mainly rely on point correspondences between point clouds, which may introduce the problems of fake flow and inconsistency, hindering the model’s ability to learn accurate and realistic motion. In this paper, we introduce a novel cross-modality self-supervised training framework that effectively addresses these issues by leveraging multi-modality data to obtain supervision signals. We design three innovative supervision signals to preserve the inherent properties of scene motion, including the masked Chamfer distance loss, the piecewise rigidity loss, and the temporal consistency loss. Through extensive experiments, we demonstrate that our proposed self-supervised framework outperforms all previous self-supervision methods for the motion prediction task.

Downloads

Published

2024-03-24

How to Cite

Fang, S., Liu, Z., Wang, M., Xu, C., Zhong, Y., & Chen, S. (2024). Self-Supervised Bird’s Eye View Motion Prediction with Cross-Modality Signals. Proceedings of the AAAI Conference on Artificial Intelligence, 38(2), 1726–1734. https://doi.org/10.1609/aaai.v38i2.27940

Issue

Section

AAAI Technical Track on Computer Vision I