Unified Interaction Consistency Learning for Single-Source Domain-Generalized Object Detection in Urban Scene

Authors

  • Peng Zhang School of Automation, Northwestern Polytechnical University, Xi'an, China
  • Xiang Yuan School of Automation, Northwestern Polytechnical University, Xi'an, China
  • Gong Cheng School of Automation, Northwestern Polytechnical University, Xi'an, China

DOI:

https://doi.org/10.1609/aaai.v40i15.38263

Abstract

Domain generalization remains a critical challenge for deploying neural networks, particularly in out-of-distribution object detection. The distributional discrepancy between training (e.g., daytime-sunny) and the realistic condition (e.g., night-rainy) inevitably produces imprecise localization and wrong classification. To address these issues, we propose a unified interaction consistency learning (UICL) framework, a novel single-source domain-generalized method designed to learn intra-class domain-invariant representations. Specifically, we put forth a cross-domain interaction mechanism to exchange region proposals between original and augmented pipelines, enriching the diversity of instance-level representations. Building upon this, we propose prediction-guided consistency learning to unify the interaction mechanism and harmonize the cross-domain representations, contributing to a discriminative prediction distribution under domain shift. In addition, we devise a cyclic interaction resilient detection strategy, which mitigates inaccurate predictions suffering from partial occlusion and ambiguous boundaries among different domains. Extensive experiments evidence that UICL significantly improves the robustness of detectors over several target domains, achieving state-of-the-art generalization performance on the diverse weather benchmark.

Downloads

Published

2026-03-14

How to Cite

Zhang, P., Yuan, X., & Cheng, G. (2026). Unified Interaction Consistency Learning for Single-Source Domain-Generalized Object Detection in Urban Scene. Proceedings of the AAAI Conference on Artificial Intelligence, 40(15), 12672–12680. https://doi.org/10.1609/aaai.v40i15.38263

Issue

Section

AAAI Technical Track on Computer Vision XII