From Discriminative to Generative: A Diffusion-Based Paradigm for Multi-Agent Collaborative Perception

Authors

  • Kexin Gong Beijing University of Posts and Telecommunications
  • Puyi Yao Beijing University of Posts and Telecommunications
  • Guiyang Luo Beijing University of Posts and Telecommunications
  • Quan Yuan Beijing University of Posts and Telecommunications
  • Tiange Fu Beijing University of Posts and Telecommunications
  • Hui Zhang Beijing Jiaotong University
  • Jinglin Li Beijing University of Posts and Telecommunications

DOI:

https://doi.org/10.1609/aaai.v40i6.42423

Abstract

Collaborative perception leveraging intermediate feature fusion has emerged as a leading paradigm to significantly enhance the environmental perception capabilities of autonomous driving systems. However, existing methods typically rely on discriminative supervision guided by downstream tasks. This paradigm compels models to learn minimal, task-specific representations, which conflicts with the goal of cooperative perception to capture comprehensive information, thereby limiting generalization. To address this issue, we propose DiGS-CP, a novel two-stage generative supervised collaborative perception framework. Specifically, we introduce a diffusion-based generative task that conditions on fused object-level features to generate representations of object-level point clouds. The proposed generative supervision provides fine-grained, task-agnostic signals that encourages the fusion module to learn comprehensive representations beyond task-specific requirements. By preserving and integrating complementary information from collaborative agents, our approach overcomes the limitations of task-specific learning and enhances the generalizability of the learned features. Furthermore, our two-stage architecture requires agents to transmit only object-level features, significantly reducing communication overhead. Extensive experiments on three benchmark datasets demonstrate that DiGS-CP achieves state-of-the-art performance in 3D object detection, while maintaining low bandwidth requirements and exhibiting excellent generalization ability.

Downloads

Published

2026-03-14

How to Cite

Gong, K., Yao, P., Luo, G., Yuan, Q., Fu, T., Zhang, H., & Li, J. (2026). From Discriminative to Generative: A Diffusion-Based Paradigm for Multi-Agent Collaborative Perception. Proceedings of the AAAI Conference on Artificial Intelligence, 40(6), 4266–4274. https://doi.org/10.1609/aaai.v40i6.42423

Issue

Section

AAAI Technical Track on Computer Vision III