Advancing Comprehensive Aesthetic Insight with Multi-Scale Text-Guided Self-Supervised Learning

Authors

  • Yuti Liu vivo Mobile Communication Co., Ltd
  • Shice Liu vivo Mobile Communication Co., Ltd
  • Junyuan Gao vivo Mobile Communication Co., Ltd
  • Peng-tao Jiang vivo Mobile Communication Co., Ltd
  • Hao Zhang vivo Mobile Communication Co., Ltd
  • Jinwei Chen vivo Mobile Communication Co., Ltd
  • Bo Li vivo Mobile Communication Co., Ltd

DOI:

https://doi.org/10.1609/aaai.v39i6.32613

Abstract

Image Aesthetic Assessment (IAA) is a vital and intricate task that entails analyzing and assessing an image's aesthetic values, and identifying its highlights and areas for improvement. Traditional methods of IAA often concentrate on a single aesthetic task and suffer from inadequate labeled datasets, thus impairing in-depth aesthetic comprehension. Despite efforts to overcome this challenge through the application of Multi-modal Large Language Models (MLLMs), such models remain underdeveloped for IAA purposes. To address this, we propose a comprehensive aesthetic MLLM capable of nuanced aesthetic insight. Central to our approach is an innovative multi-scale text-guided self-supervised learning technique. This technique features a multi-scale feature alignment module and capitalizes on a wealth of unlabeled data in a self-supervised manner to structurally and functionally enhance aesthetic ability. The empirical evidence indicates that accompanied with extensive instruct-tuning, our model sets new state-of-the-art benchmarks across multiple tasks, including aesthetic scoring, aesthetic commenting, and personalized image aesthetic assessment. Remarkably, it also demonstrates zero-shot learning capabilities in the emerging task of aesthetic suggesting. Furthermore, for personalized image aesthetic assessment, we harness the potential of in-context learning and showcase its inherent advantages.

Downloads

Published

2025-04-11

How to Cite

Liu, Y., Liu, S., Gao, J., Jiang, P.- tao, Zhang, H., Chen, J., & Li, B. (2025). Advancing Comprehensive Aesthetic Insight with Multi-Scale Text-Guided Self-Supervised Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 39(6), 5748–5756. https://doi.org/10.1609/aaai.v39i6.32613

Issue

Section

AAAI Technical Track on Computer Vision V