A Training-free Synthetic Data Selection Method for Semantic Segmentation

Authors

  • Hao Tang China University of Petroleum (East China)
  • Siyue Yu Xi'an Jiaotong-Liverpool University
  • Jian Pang China University of Petroleum (East China)
  • Bingfeng Zhang China University of Petroleum (East China)

DOI:

https://doi.org/10.1609/aaai.v39i7.32777

Abstract

Training semantic segmenter with synthetic data has been attracting great attention due to its easy accessibility and huge quantities. Most previous methods focused on producing large-scale synthetic image-annotation samples and then training the segmenter with all of them. However, such a solution remains a main challenge in that the poor-quality samples are unavoidable, and using them to train the model will damage the training process. In this paper, we propose a training-free Synthetic Data Selection (SDS) strategy with CLIP to select high-quality samples for building a reliable synthetic dataset. Specifically, given massive synthetic image-annotation pairs, we first design a Perturbation-based CLIP Similarity (PCS) to measure the reliability of synthetic image, thus removing samples with low-quality images. Then we propose a class-balance Annotation Similarity Filter (ASF) by comparing the synthetic annotation with the response of CLIP to remove the samples related to low-quality annotations. The experimental results show that using our method significantly reduces the data size by half, while the trained segmenter achieves higher performance.

Published

2025-04-11

How to Cite

Tang, H., Yu, S., Pang, J., & Zhang, B. (2025). A Training-free Synthetic Data Selection Method for Semantic Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 39(7), 7229–7237. https://doi.org/10.1609/aaai.v39i7.32777

Issue

Section

AAAI Technical Track on Computer Vision VI