NaviFormer: A Spatio-Temporal Context-Aware Transformer for Object Navigation

Authors

  • Wei Xie Nanjing University of Science and Technology
  • Haobo Jiang Nanyang Technological University
  • Yun Zhu Nanjing University of Science and Technology
  • Jianjun Qian Nanjing University of Science and Technology
  • Jin Xie Nanjing University

DOI:

https://doi.org/10.1609/aaai.v39i14.33612

Abstract

Learning discriminative state representations of agents, encompassing the spatial layout and temporal pose trajectory, is essential for effective navigation decisions. However, existing approaches often rely on simplistic plain networks for navigation information fusion, overlooking the complex long-range dependencies across spatio-temporal cues, which leads to suboptimal state perception and potential decision failures. In this paper, we introduce NaviFormer, an effective encoder-decoder navigation transformer, to aggregate discriminative spatio-temporal context information for object navigation. Our navigation encoder not only encodes spatial layouts and temporal agent poses but also innovatively constructs and encodes a passable frontier map, enriching the original state encoding with cues of potential exploration regions. Furthermore, our navigation decoder employs spatio-temporal self-attention and cross-attention mechanisms to model the dependencies among spatial layout encoding, temporal pose encoding, and passable frontier encoding, thereby facilitating comprehensive contextual state feature aggregation. Finally, we leverage these learned spatio-temporal contextual state representations for PPO-based navigation decisions. Extensive experiments on the Gibson, Habitat-Matterport3D (HM3D) and Matterport3D (MP3D) datasets demonstrate the superiority of our approach.

Published

2025-04-11

How to Cite

Xie, W., Jiang, H., Zhu, Y., Qian, J., & Xie, J. (2025). NaviFormer: A Spatio-Temporal Context-Aware Transformer for Object Navigation. Proceedings of the AAAI Conference on Artificial Intelligence, 39(14), 14708–14716. https://doi.org/10.1609/aaai.v39i14.33612

Issue

Section

AAAI Technical Track on Intelligent Robots