Diverse Person: Customize Your Own Dataset for Text-Based Person Search

Authors

  • Zifan Song Tongji University
  • Guosheng Hu Oosto
  • Cairong Zhao Tongji University

DOI:

https://doi.org/10.1609/aaai.v38i5.28298

Keywords:

CV: Image and Video Retrieval, CV: Language and Vision

Abstract

Text-based person search is a challenging task aimed at locating specific target pedestrians through text descriptions. Recent advancements have been made in this field, but there remains a deficiency in datasets tailored for text-based person search. The creation of new, real-world datasets is hindered by concerns such as the risk of pedestrian privacy leakage and the substantial costs of annotation. In this paper, we introduce a framework, named Diverse Person (DP), to achieve efficient and high-quality text-based person search data generation without involving privacy concerns. Specifically, we propose to leverage available images of clothing and accessories as reference attribute images to edit the original dataset images through diffusion models. Additionally, we employ a Large Language Model (LLM) to produce annotations that are both high in quality and stylistically consistent with those found in real-world datasets. Extensive experimental results demonstrate that the baseline models trained with our DP can achieve new state-of-the-art results on three public datasets, with performance improvements up to 4.82%, 2.15%, and 2.28% on CUHK-PEDES, ICFG-PEDES, and RSTPReid in terms of Rank-1 accuracy, respectively.

Published

2024-03-24

How to Cite

Song, Z., Hu, G., & Zhao, C. (2024). Diverse Person: Customize Your Own Dataset for Text-Based Person Search. Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 4943-4951. https://doi.org/10.1609/aaai.v38i5.28298

Issue

Section

AAAI Technical Track on Computer Vision IV