A Theory-Inspired Framework for Few-Shot Cross-Modal Sketch Person Re-Identification
DOI:
https://doi.org/10.1609/aaai.v40i6.42425Abstract
Sketch-based person re-identification aims to match hand-drawn sketches with RGB surveillance images, but remains challenging due to severe modality gaps and limited labeled data. To address this, we propose KTCAA, a theoretically inspired framework for few-shot cross-modal generalization. Drawing on generalization bounds, we identify two key factors affecting target risk: (1) domain discrepancy, reflecting the alignment difficulty between source and target distributions; and (2) perturbation invariance, measuring the model’s robustness to modality shifts. Accordingly, we design: (1) Alignment Augmentation (AA), which applies localized sketch-style transformations to simulate target distributions and guide progressive alignment; and (2) Knowledge Transfer Catalyst (KTC), which enhances perturbation invariance by introducing worst-case modality perturbations and enforcing consistency. These modules are jointly optimized within a meta-learning paradigm that transfers alignment knowledge from data-abundant RGB domains to sketch scenarios. Experiments on multiple benchmarks show that KTCAA achieves state-of-the-art performance, particularly under data-scarce conditions.Downloads
Published
2026-03-14
How to Cite
Gong, Y., Hou, Y., Shi, J., Diep, K. L., & Jiang, M. (2026). A Theory-Inspired Framework for Few-Shot Cross-Modal Sketch Person Re-Identification. Proceedings of the AAAI Conference on Artificial Intelligence, 40(6), 4284–4292. https://doi.org/10.1609/aaai.v40i6.42425
Issue
Section
AAAI Technical Track on Computer Vision III