A Training-free Synthetic Data Selection Method for Semantic Segmentation

Hao Tang; Siyue Yu; Jian Pang; Bingfeng Zhang

doi:10.1609/aaai.v39i7.32777

Authors

Hao Tang China University of Petroleum (East China)
Siyue Yu Xi'an Jiaotong-Liverpool University
Jian Pang China University of Petroleum (East China)
Bingfeng Zhang China University of Petroleum (East China)

DOI:

https://doi.org/10.1609/aaai.v39i7.32777

Abstract

Training semantic segmenter with synthetic data has been attracting great attention due to its easy accessibility and huge quantities. Most previous methods focused on producing large-scale synthetic image-annotation samples and then training the segmenter with all of them. However, such a solution remains a main challenge in that the poor-quality samples are unavoidable, and using them to train the model will damage the training process. In this paper, we propose a training-free Synthetic Data Selection (SDS) strategy with CLIP to select high-quality samples for building a reliable synthetic dataset. Specifically, given massive synthetic image-annotation pairs, we first design a Perturbation-based CLIP Similarity (PCS) to measure the reliability of synthetic image, thus removing samples with low-quality images. Then we propose a class-balance Annotation Similarity Filter (ASF) by comparing the synthetic annotation with the response of CLIP to remove the samples related to low-quality annotations. The experimental results show that using our method significantly reduces the data size by half, while the trained segmenter achieves higher performance.

A Training-free Synthetic Data Selection Method for Semantic Segmentation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information