Towards Robust Event-Based Depth Estimation: Bridging Synthetic and Real Domains with Motion Adaptation
DOI:
https://doi.org/10.1609/aaai.v40i7.37448Abstract
Event cameras provide microsecond latency and high dynamic range, making them ideal for 3D perception tasks in traffic scenes with challenging lighting conditions. Yet existing methods often struggle to generalize to out-of-domain environments due to the limited availability of diverse training data. While synthetic data offers an easily accessible alternative, it introduces a significant sim-to-real gap, particularly in motion patterns. We tackle this challenge by introducing Motion-Adaptation Mamba (MA-Mamba), a dual-track framework that advances both architecture and data augmentation. At the architectural level, we introduce a lightweight Spatio-Temporal Association module that captures motion-induced appearance variations at arbitrary scales, and an Adaptive Memory Balancing module, built on the Mamba state-space framework, that adaptively filters memory updates to maintain stable scene context under diverse dynamics. At the data level, we design event-oriented augmentations that simulate varied motion patterns and apply priority-based masked sequence modeling to strengthen long-range spatio-temporal reasoning. Trained solely on synthetic data, MA-Mamba delivers substantial zero-shot gains on multiple real-world benchmarks, demonstrating strong robustness and generalizability.Downloads
Published
2026-03-14
How to Cite
Ji, Y., Wang, H., Chen, Y., Cheng, X., Yang, L., & Zheng, X. (2026). Towards Robust Event-Based Depth Estimation: Bridging Synthetic and Real Domains with Motion Adaptation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(7), 5323–5331. https://doi.org/10.1609/aaai.v40i7.37448
Issue
Section
AAAI Technical Track on Computer Vision IV