FlexDataset: Crafting Annotated Dataset Generation for Diverse Applications

Authors

  • Ellen Yi-Ge Carnegie Mellon University
  • Leo Shawn University of the Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v39i9.33027

Abstract

High-quality, pixel-level annotated datasets are crucial for training deep learning models, while their creation is often labor-intensive, time-consuming, and costly. Generative diffusion models have then gained prominence for producing synthetic datasets, yet existing text-to-data methods struggle with generating complex scenes involving multiple objects and intricate spatial arrangements. To address these limitations, we introduce FlexDataset, a framework that pioneers the composition-to-data (C2D) paradigm. FlexDataset generates high-fidelity synthetic datasets with versatile annotations, tailored for tasks like salient object detection, depth estimation, and segmentation. Leveraging a meticulously designed composition-to-image (C2I) framework, it offers precise positional and categorical control. Our Versatile Annotation Generation (VAG) Plan A further enhances efficiency by exploiting rich latent representations through tuned perception decoders, reducing annotation time by nearly fivefold. FlexDataset allows unlimited generation of customized, multi-instance and multi-category (MIMC) annotated data. Extensive experiments show that FlexDataset sets a new standard in synthetic dataset generation across multiple datasets and tasks, including zero-shot and long-tail scenarios.

Published

2025-04-11

How to Cite

Yi-Ge, E., & Shawn, L. (2025). FlexDataset: Crafting Annotated Dataset Generation for Diverse Applications. Proceedings of the AAAI Conference on Artificial Intelligence, 39(9), 9481–9489. https://doi.org/10.1609/aaai.v39i9.33027

Issue

Section

AAAI Technical Track on Computer Vision VIII