A Theory-Inspired Framework for Few-Shot Cross-Modal Sketch Person Re-Identification

Authors

  • Yunpeng Gong Xiamen University
  • Yongjie Hou Xiamen University
  • Jiangming Shi Xiamen University
  • Kim Long Diep Xiamen University
  • Min Jiang Xiamen University

DOI:

https://doi.org/10.1609/aaai.v40i6.42425

Abstract

Sketch-based person re-identification aims to match hand-drawn sketches with RGB surveillance images, but remains challenging due to severe modality gaps and limited labeled data. To address this, we propose KTCAA, a theoretically inspired framework for few-shot cross-modal generalization. Drawing on generalization bounds, we identify two key factors affecting target risk: (1) domain discrepancy, reflecting the alignment difficulty between source and target distributions; and (2) perturbation invariance, measuring the model’s robustness to modality shifts. Accordingly, we design: (1) Alignment Augmentation (AA), which applies localized sketch-style transformations to simulate target distributions and guide progressive alignment; and (2) Knowledge Transfer Catalyst (KTC), which enhances perturbation invariance by introducing worst-case modality perturbations and enforcing consistency. These modules are jointly optimized within a meta-learning paradigm that transfers alignment knowledge from data-abundant RGB domains to sketch scenarios. Experiments on multiple benchmarks show that KTCAA achieves state-of-the-art performance, particularly under data-scarce conditions.

Published

2026-03-14

How to Cite

Gong, Y., Hou, Y., Shi, J., Diep, K. L., & Jiang, M. (2026). A Theory-Inspired Framework for Few-Shot Cross-Modal Sketch Person Re-Identification. Proceedings of the AAAI Conference on Artificial Intelligence, 40(6), 4284–4292. https://doi.org/10.1609/aaai.v40i6.42425

Issue

Section

AAAI Technical Track on Computer Vision III