Self-Supervised Bird’s Eye View Motion Prediction with Cross-Modality Signals

Shaoheng Fang; Zuhong Liu; Mingyu Wang; Chenxin Xu; Yiqi Zhong; Siheng Chen

doi:10.1609/aaai.v38i2.27940

Authors

Shaoheng Fang Shanghai Jiao Tong University
Zuhong Liu Shanghai JIaoTong University
Mingyu Wang University of Chinese Academy of Sciences
Chenxin Xu Shanghai Jiao Tong University
Yiqi Zhong University of Southern California
Siheng Chen Shanghai Jiao Tong University Shanghai AI Laboratory

DOI:

https://doi.org/10.1609/aaai.v38i2.27940

Keywords:

CV: Vision for Robotics & Autonomous Driving, ML: Unsupervised & Self-Supervised Learning

Abstract

Learning the dense bird's eye view (BEV) motion flow in a self-supervised manner is an emerging research for robotics and autonomous driving. Current self-supervised methods mainly rely on point correspondences between point clouds, which may introduce the problems of fake flow and inconsistency, hindering the model’s ability to learn accurate and realistic motion. In this paper, we introduce a novel cross-modality self-supervised training framework that effectively addresses these issues by leveraging multi-modality data to obtain supervision signals. We design three innovative supervision signals to preserve the inherent properties of scene motion, including the masked Chamfer distance loss, the piecewise rigidity loss, and the temporal consistency loss. Through extensive experiments, we demonstrate that our proposed self-supervised framework outperforms all previous self-supervision methods for the motion prediction task.

Self-Supervised Bird’s Eye View Motion Prediction with Cross-Modality Signals

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information