DragNeXt: Rethinking Drag-Based Image Editing

Authors

  • Yuan Zhou Nanyang Technological University
  • Junbao Zhou Nanyang Technological University
  • Qingshan Xu Nanyang Technological University
  • Kesen Zhao Nanyang Technological University
  • Yuxuan Wang Nanyang Technological University
  • Hao Fei National University of Singapore
  • Richang Hong Hefei University of Technology
  • Hanwang Zhang Nanyang Technological University

DOI:

https://doi.org/10.1609/aaai.v40i16.38390

Abstract

Drag-Based Image Editing (DBIE), which allows users to manipulate images by directly dragging objects within them, has recently attracted much attention from the community. However, it faces two key challenges: (i) point-based drag is often highly ambiguous and difficult to align with user intentions; (ii) current DBIE methods primarily rely on alternating between motion supervision and point tracking, which is not only cumbersome but also fails to produce high-quality results. These limitations motivate us to explore DBIE from a new perspective---unifying it as a Latent Region Optimization (LRO) problem that aims to use region-level geometric transformations to optimize latent code to realize drag manipulation. Thus, by specifying the areas and types of geometric transformations, we can effectively address the ambiguity issue. We also propose a simple yet effective editing framework, dubbed DragNeXt. It solves LRO through Progressive Backward Self-Intervention (PBSI), simplifying the overall procedure of the alternating workflow while further enhancing quality by fully leveraging region-level structure information and progressive guidance from intermediate drag states. We validate DragNeXt on our NextBench, and extensive experiments demonstrate that our proposed method can significantly outperform existing approaches.

Downloads

Published

2026-03-14

How to Cite

Zhou, Y., Zhou, J., Xu, Q., Zhao, K., Wang, Y., Fei, H., Hong, R., & Zhang, H. (2026). DragNeXt: Rethinking Drag-Based Image Editing. Proceedings of the AAAI Conference on Artificial Intelligence, 40(16), 13818-13825. https://doi.org/10.1609/aaai.v40i16.38390

Issue

Section

AAAI Technical Track on Computer Vision XIII