What Makes Good Collaborative Views? Contrastive Mutual Information Maximization for Multi-Agent Perception

Authors

  • Wanfang Su School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China
  • Lixing Chen School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China
  • Yang Bai School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
  • Xi Lin School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China
  • Gaolei Li School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China
  • Zhe Qu School of Computer Science and Engineering, Central South University, Changsha, China
  • Pan Zhou Hubei Engineering Research Center on Big Data Security, School of Cyber Science and Engineering, Huazhong University of Science of Technology, Wuhan, China

DOI:

https://doi.org/10.1609/aaai.v38i16.29705

Keywords:

MAS: Coordination and Collaboration, MAS: Multiagent Learning

Abstract

Multi-agent perception (MAP) allows autonomous systems to understand complex environments by interpreting data from multiple sources. This paper investigates intermediate collaboration for MAP with a specific focus on exploring "good" properties of collaborative view (i.e., post-collaboration feature) and its underlying relationship to individual views (i.e., pre-collaboration features), which were treated as an opaque procedure by most existing works. We propose a novel framework named CMiMC (Contrastive Mutual Information Maximization for Collaborative Perception) for intermediate collaboration. The core philosophy of CMiMC is to preserve discriminative information of individual views in the collaborative view by maximizing mutual information between pre- and post-collaboration features while enhancing the efficacy of collaborative views by minimizing the loss function of downstream tasks. In particular, we define multi-view mutual information (MVMI) for intermediate collaboration that evaluates correlations between collaborative views and individual views on both global and local scales. We establish CMiMNet based on multi-view contrastive learning to realize estimation and maximization of MVMI, which assists the training of a collaborative encoder for voxel-level feature fusion. We evaluate CMiMC on V2X-Sim 1.0, and it improves the SOTA average precision by 3.08% and 4.44% at 0.5 and 0.7 IoU (Intersection-over-Union) thresholds, respectively. In addition, CMiMC can reduce communication volume to 1/32 while achieving performance comparable to SOTA. Code and Appendix are released at https://github.com/77SWF/CMiMC.

Published

2024-03-24

How to Cite

Su, W., Chen, L., Bai, Y., Lin, X., Li, G., Qu, Z., & Zhou, P. (2024). What Makes Good Collaborative Views? Contrastive Mutual Information Maximization for Multi-Agent Perception. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16), 17550-17558. https://doi.org/10.1609/aaai.v38i16.29705

Issue

Section

AAAI Technical Track on Multiagent Systems