A Theory-Inspired Framework for Few-Shot Cross-Modal Sketch Person Re-Identification

Yunpeng Gong; Yongjie Hou; Jiangming Shi; Kim Long Diep; Min Jiang

doi:10.1609/aaai.v40i6.42425

Authors

Yunpeng Gong Xiamen University
Yongjie Hou Xiamen University
Jiangming Shi Xiamen University
Kim Long Diep Xiamen University
Min Jiang Xiamen University

DOI:

https://doi.org/10.1609/aaai.v40i6.42425

Abstract

Sketch-based person re-identification aims to match hand-drawn sketches with RGB surveillance images, but remains challenging due to severe modality gaps and limited labeled data. To address this, we propose KTCAA, a theoretically inspired framework for few-shot cross-modal generalization. Drawing on generalization bounds, we identify two key factors affecting target risk: (1) domain discrepancy, reflecting the alignment difficulty between source and target distributions; and (2) perturbation invariance, measuring the model’s robustness to modality shifts. Accordingly, we design: (1) Alignment Augmentation (AA), which applies localized sketch-style transformations to simulate target distributions and guide progressive alignment; and (2) Knowledge Transfer Catalyst (KTC), which enhances perturbation invariance by introducing worst-case modality perturbations and enforcing consistency. These modules are jointly optimized within a meta-learning paradigm that transfers alignment knowledge from data-abundant RGB domains to sketch scenarios. Experiments on multiple benchmarks show that KTCAA achieves state-of-the-art performance, particularly under data-scarce conditions.

A Theory-Inspired Framework for Few-Shot Cross-Modal Sketch Person Re-Identification

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information