From Discriminative to Generative: A Diffusion-Based Paradigm for Multi-Agent Collaborative Perception

Kexin Gong; Puyi Yao; Guiyang Luo; Quan Yuan; Tiange Fu; Hui Zhang; Jinglin Li

doi:10.1609/aaai.v40i6.42423

Authors

Kexin Gong Beijing University of Posts and Telecommunications
Puyi Yao Beijing University of Posts and Telecommunications
Guiyang Luo Beijing University of Posts and Telecommunications
Quan Yuan Beijing University of Posts and Telecommunications
Tiange Fu Beijing University of Posts and Telecommunications
Hui Zhang Beijing Jiaotong University
Jinglin Li Beijing University of Posts and Telecommunications

DOI:

https://doi.org/10.1609/aaai.v40i6.42423

Abstract

Collaborative perception leveraging intermediate feature fusion has emerged as a leading paradigm to significantly enhance the environmental perception capabilities of autonomous driving systems. However, existing methods typically rely on discriminative supervision guided by downstream tasks. This paradigm compels models to learn minimal, task-specific representations, which conflicts with the goal of cooperative perception to capture comprehensive information, thereby limiting generalization. To address this issue, we propose DiGS-CP, a novel two-stage generative supervised collaborative perception framework. Specifically, we introduce a diffusion-based generative task that conditions on fused object-level features to generate representations of object-level point clouds. The proposed generative supervision provides fine-grained, task-agnostic signals that encourages the fusion module to learn comprehensive representations beyond task-specific requirements. By preserving and integrating complementary information from collaborative agents, our approach overcomes the limitations of task-specific learning and enhances the generalizability of the learned features. Furthermore, our two-stage architecture requires agents to transmit only object-level features, significantly reducing communication overhead. Extensive experiments on three benchmark datasets demonstrate that DiGS-CP achieves state-of-the-art performance in 3D object detection, while maintaining low bandwidth requirements and exhibiting excellent generalization ability.

From Discriminative to Generative: A Diffusion-Based Paradigm for Multi-Agent Collaborative Perception

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information