StyleProto: Style-Augmented Prototype Learning for Cross-Domain Few-Shot Object Detection

Authors

  • Xi Yang Xidian University
  • Quantao Xie Xidian University

DOI:

https://doi.org/10.1609/aaai.v40i14.38160

Abstract

Cross-Domain Few-Shot Object Detection (CD-FSOD) faces significant challenges due to the dual issues of domain shift and limited labeled samples. One major challenge is style bias, caused by limited support samples that fail to represent the target domain’s style diversity. Another is feature confusion, which stems from distribution shifts and limited supervision, manifesting as both object-background ambiguity and object-object confusion. To address these challenges, we propose Style-Augmented Prototype Learning (StyleProto), which constructs style-aware prototypes from support samples with diverse visual styles, and refines them via spatial weighting and discriminative fusion. Specifically, our StyleProto consists of three components: (1) Style Generation Augmentation (SGA); (2) Semantic-Focused Prototype Construction (SPC); (3) Hierarchical Prototype Fusion Aggregator (HPFA). SGA synthesizes style-diverse yet semantically consistent training samples by recombining style statistics from the support set, thus improving robustness to unseen styles. SPC aggregates support features using spatial attention to highlight object semantics and suppress background noise, yielding cleaner and more distinctive class prototypes. HPFA leverages query-guided attention to integrate discriminative support features, enhancing prototype representations with richer class-specific details. Extensive experiments on multiple benchmarks demonstrate that StyleProto consistently outperforms existing state-of-the-art methods.

Downloads

Published

2026-03-14

How to Cite

Yang, X., & Xie, Q. (2026). StyleProto: Style-Augmented Prototype Learning for Cross-Domain Few-Shot Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 40(14), 11748–11756. https://doi.org/10.1609/aaai.v40i14.38160

Issue

Section

AAAI Technical Track on Computer Vision XI