SparseCoop: Cooperative Perception with Kinematic-Grounded Queries

Authors

  • Jiahao Wang Tsinghua University
  • Zhongwei Jiang Nanyang Technological University
  • Wenchao Sun Tsinghua University
  • Jiaru Zhong Hong Kong Polytechnic University
  • Haibao Yu The University of Hong Kong
  • Yuner Zhang University of Pennsylvania
  • Chenyang Lu Tsinghua University
  • Chuang Zhang Tsinghua University
  • Lei He Tsinghua University
  • Shaobing Xu Tsinghua University
  • Jianqiang Wang Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v40i12.37952

Abstract

Cooperative perception is critical for autonomous driving, overcoming the inherent limitations of a single vehicle, such as occlusions and constrained fields-of-view. However, current approaches sharing dense Bird's-Eye-View (BEV) features are constrained by quadratically-scaling communication costs and the lack of flexibility and interpretability for precise alignment across asynchronous or disparate viewpoints. While emerging sparse query-based methods offer an alternative, they often suffer from inadequate geometric representations, suboptimal fusion strategies, and training instability. In this paper, we propose SparseCoop, a fully sparse cooperative perception framework for 3D detection and tracking that completely discards intermediate BEV representations. Our framework features a trio of innovations: a kinematic grounded instance query that uses an explicit state vector with 3D geometry and velocity for precise spatio-temporal alignment; a coarse-to-fine aggregation module that effectively integrates information from both matched and unmatched instances; and a cooperative instance denoising task that provides stable, abundant supervision to accelerate and stabilize training. Experiments on V2X-Seq and Griffin datasets show SparseCoop achieves state-of-the-art performance. Notably, it delivers this performance with superior computational efficiency and a highly competitive transmission cost, while showing remarkable robustness to real-world challenges like communication latency.

Downloads

Published

2026-03-14

How to Cite

Wang, J., Jiang, Z., Sun, W., Zhong, J., Yu, H., Zhang, Y., … Wang, J. (2026). SparseCoop: Cooperative Perception with Kinematic-Grounded Queries. Proceedings of the AAAI Conference on Artificial Intelligence, 40(12), 9876–9884. https://doi.org/10.1609/aaai.v40i12.37952

Issue

Section

AAAI Technical Track on Computer Vision IX