Diverse Person: Customize Your Own Dataset for Text-Based Person Search

Zifan Song; Guosheng Hu; Cairong Zhao

doi:10.1609/aaai.v38i5.28298

Authors

Zifan Song Tongji University
Guosheng Hu Oosto
Cairong Zhao Tongji University

DOI:

https://doi.org/10.1609/aaai.v38i5.28298

Keywords:

CV: Image and Video Retrieval, CV: Language and Vision

Abstract

Text-based person search is a challenging task aimed at locating specific target pedestrians through text descriptions. Recent advancements have been made in this field, but there remains a deficiency in datasets tailored for text-based person search. The creation of new, real-world datasets is hindered by concerns such as the risk of pedestrian privacy leakage and the substantial costs of annotation. In this paper, we introduce a framework, named Diverse Person (DP), to achieve efficient and high-quality text-based person search data generation without involving privacy concerns. Specifically, we propose to leverage available images of clothing and accessories as reference attribute images to edit the original dataset images through diffusion models. Additionally, we employ a Large Language Model (LLM) to produce annotations that are both high in quality and stylistically consistent with those found in real-world datasets. Extensive experimental results demonstrate that the baseline models trained with our DP can achieve new state-of-the-art results on three public datasets, with performance improvements up to 4.82%, 2.15%, and 2.28% on CUHK-PEDES, ICFG-PEDES, and RSTPReid in terms of Rank-1 accuracy, respectively.

Diverse Person: Customize Your Own Dataset for Text-Based Person Search

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription