SparseCoop: Cooperative Perception with Kinematic-Grounded Queries

Jiahao Wang; Zhongwei Jiang; Wenchao Sun; Jiaru Zhong; Haibao Yu; Yuner Zhang; Chenyang Lu; Chuang Zhang; Lei He; Shaobing Xu; Jianqiang Wang

doi:10.1609/aaai.v40i12.37952

Authors

Jiahao Wang Tsinghua University
Zhongwei Jiang Nanyang Technological University
Wenchao Sun Tsinghua University
Jiaru Zhong Hong Kong Polytechnic University
Haibao Yu The University of Hong Kong
Yuner Zhang University of Pennsylvania
Chenyang Lu Tsinghua University
Chuang Zhang Tsinghua University
Lei He Tsinghua University
Shaobing Xu Tsinghua University
Jianqiang Wang Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v40i12.37952

Abstract

Cooperative perception is critical for autonomous driving, overcoming the inherent limitations of a single vehicle, such as occlusions and constrained fields-of-view. However, current approaches sharing dense Bird's-Eye-View (BEV) features are constrained by quadratically-scaling communication costs and the lack of flexibility and interpretability for precise alignment across asynchronous or disparate viewpoints. While emerging sparse query-based methods offer an alternative, they often suffer from inadequate geometric representations, suboptimal fusion strategies, and training instability. In this paper, we propose SparseCoop, a fully sparse cooperative perception framework for 3D detection and tracking that completely discards intermediate BEV representations. Our framework features a trio of innovations: a kinematic grounded instance query that uses an explicit state vector with 3D geometry and velocity for precise spatio-temporal alignment; a coarse-to-fine aggregation module that effectively integrates information from both matched and unmatched instances; and a cooperative instance denoising task that provides stable, abundant supervision to accelerate and stabilize training. Experiments on V2X-Seq and Griffin datasets show SparseCoop achieves state-of-the-art performance. Notably, it delivers this performance with superior computational efficiency and a highly competitive transmission cost, while showing remarkable robustness to real-world challenges like communication latency.

SparseCoop: Cooperative Perception with Kinematic-Grounded Queries

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information