PerTouch: VLM-Driven Agent for Personalized and Semantic Image Retouching

Authors

  • Zewei Chang VCIP, CS, Nankai University
  • Zheng-Peng Duan VCIP, CS, Nankai University
  • Jianxing Zhang Samsung R&D Institute China - Beijing (SRC-B)
  • Chun-Le Guo VCIP, CS, Nankai University NKIARI, Shenzhen Futian
  • Siyu Liu VCIP, CS, Nankai University
  • Hyungju Chun The Department of Camera Innovation Group, Samsung Electronics
  • Hyunhee Park The Department of Camera Innovation Group, Samsung Electronics
  • Zikun Liu Samsung R&D Institute China - Beijing (SRC-B)
  • Chongyi Li VCIP, CS, Nankai University NKIARI, Shenzhen Futian

DOI:

https://doi.org/10.1609/aaai.v40i4.37264

Abstract

Image retouching aims to enhance visual quality while aligning with users' personalized aesthetic preferences. To address the challenge of balancing controllability and subjectivity, we propose a unified diffusion-based image retouching framework called PerTouch. Our method supports semantic-level image retouching while maintaining global aesthetics. Using parameter maps containing attribute values in specific semantic regions as input, PerTouch constructs an explicit parameter-to-image mapping for fine-grained image retouching. To improve semantic boundary perception, we introduce semantic replacement and parameter perturbation mechanisms during training. To connect natural language instructions with visual control, we develop a VLM-driven agent to handle both strong and weak user instructions. Equipped with mechanisms of feedback-driven rethinking and scene-aware memory, PerTouch better aligns with user intent and captures long-term preferences. Extensive experiments demonstrate each component’s effectiveness and the superior performance of PerTouch in personalized image retouching.

Downloads

Published

2026-03-14

How to Cite

Chang, Z., Duan, Z.-P., Zhang, J., Guo, C.-L., Liu, S., Chun, H., Park, H., Liu, Z., & Li, C. (2026). PerTouch: VLM-Driven Agent for Personalized and Semantic Image Retouching. Proceedings of the AAAI Conference on Artificial Intelligence, 40(4), 2752-2759. https://doi.org/10.1609/aaai.v40i4.37264

Issue

Section

AAAI Technical Track on Computer Vision I