A Global Occlusion-Aware Approach to Self-Supervised Monocular Visual Odometry

Yao Lu; Xiaoli Xu; Mingyu Ding; Zhiwu Lu; Tao Xiang

doi:10.1609/aaai.v35i3.16325

Authors

Yao Lu Renmin University of China Beijing Key Laboratory of Big Data Management and Analysis Methods
Xiaoli Xu Renmin University of China Beijing Key Laboratory of Big Data Management and Analysis Methods
Mingyu Ding The University of Hong Kong
Zhiwu Lu Renmin University of China Beijing Key Laboratory of Big Data Management and Analysis Methods
Tao Xiang University of Surrey

DOI:

https://doi.org/10.1609/aaai.v35i3.16325

Keywords:

Vision for Robotics & Autonomous Driving

Abstract

Self-Supervised monocular visual odometry (VO) is often cast into a view synthesis problem based on depth and camera pose estimation. One of the key challenges is to accurately and robustly estimate depth with occlusions and moving objects in the scene. Existing methods simply detect and mask out regions of occlusions locally by several convolutional layers, and then perform only partial view synthesis in the rest of the image. However, occlusion and moving object detection is an unsolved problem itself which requires global layout information. Inaccurate detection inevitably results in incorrect depth as well as pose estimation. In this work, instead of locally detecting and masking out occlusions and moving objects, we propose to alleviate their negative effects on monocular VO implicitly but more effectively from two global perspectives. First, a multi-scale non-local attention module, consisting of both intra-stage augmented attention and cascaded across-stage attention, is proposed for robust depth estimation given occlusions, alleviating the impacts of occlusions via global attention modeling. Second, adversarial learning is introduced in view synthesis for monocular VO. Unlike existing methods that use pixel-level losses on the quality of synthesized views, we enforce the synthetic view to be indistinguishable from the real one at the scene-level. Such a global constraint again helps cope with occluded and moving regions. Extensive experiments on the KITTI dataset show that our approach achieves new state-of-the-art in both pose estimation and depth recovery.

A Global Occlusion-Aware Approach to Self-Supervised Monocular Visual Odometry

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information