CLIPVG: Text-Guided Image Manipulation Using Differentiable Vector Graphics

Yiren Song; Xuning Shao; Kang Chen; Weidong Zhang; Zhongliang Jing; Minzhe Li

doi:10.1609/aaai.v37i2.25326

Authors

Yiren Song Shanghai Jiao Tong University Netease Games AI Lab
Xuning Shao Netease Games AI Lab
Kang Chen NetEase Games AI Lab
Weidong Zhang Netease Games AI Lab
Zhongliang Jing Shanghai Jiao Tong University
Minzhe Li Shanghai Jiao Tong University

DOI:

https://doi.org/10.1609/aaai.v37i2.25326

Keywords:

CV: Applications, CV: Language and Vision, ML: Unsupervised & Self-Supervised Learning

Abstract

Considerable progress has recently been made in leveraging CLIP (Contrastive Language-Image Pre-Training) models for text-guided image manipulation. However, all existing works rely on additional generative models to ensure the quality of results, because CLIP alone cannot provide enough guidance information for fine-scale pixel-level changes. In this paper, we introduce CLIPVG, a text-guided image manipulation framework using differentiable vector graphics, which is also the first CLIP-based general image manipulation framework that does not require any additional generative models. We demonstrate that CLIPVG can not only achieve state-of-art performance in both semantic correctness and synthesis quality, but also is flexible enough to support various applications far beyond the capability of all existing methods.

CLIPVG: Text-Guided Image Manipulation Using Differentiable Vector Graphics

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information