Diverse Person: Customize Your Own Dataset for Text-Based Person Search
DOI:
https://doi.org/10.1609/aaai.v38i5.28298Keywords:
CV: Image and Video Retrieval, CV: Language and VisionAbstract
Text-based person search is a challenging task aimed at locating specific target pedestrians through text descriptions. Recent advancements have been made in this field, but there remains a deficiency in datasets tailored for text-based person search. The creation of new, real-world datasets is hindered by concerns such as the risk of pedestrian privacy leakage and the substantial costs of annotation. In this paper, we introduce a framework, named Diverse Person (DP), to achieve efficient and high-quality text-based person search data generation without involving privacy concerns. Specifically, we propose to leverage available images of clothing and accessories as reference attribute images to edit the original dataset images through diffusion models. Additionally, we employ a Large Language Model (LLM) to produce annotations that are both high in quality and stylistically consistent with those found in real-world datasets. Extensive experimental results demonstrate that the baseline models trained with our DP can achieve new state-of-the-art results on three public datasets, with performance improvements up to 4.82%, 2.15%, and 2.28% on CUHK-PEDES, ICFG-PEDES, and RSTPReid in terms of Rank-1 accuracy, respectively.Downloads
Published
2024-03-24
How to Cite
Song, Z., Hu, G., & Zhao, C. (2024). Diverse Person: Customize Your Own Dataset for Text-Based Person Search. Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 4943–4951. https://doi.org/10.1609/aaai.v38i5.28298
Issue
Section
AAAI Technical Track on Computer Vision IV