Beyond Cosine Similarity: Magnitude-Aware CLIP for No-Reference Image Quality Assessment

Authors

  • Zhicheng Liao South China Normal University
  • Dongxu Wu South China Normal University
  • Zhenshan Shi South China Normal University
  • Sijie Mai South China Normal University
  • Hanwei Zhu Nanyang Technological University
  • Lingyu Zhu City University of Hong Kong
  • Yuncheng Jiang South China Normal University
  • Baoliang Chen South China Normal University

DOI:

https://doi.org/10.1609/aaai.v40i9.37627

Abstract

Recent efforts have repurposed the Contrastive Language-Image Pre-training (CLIP) model for No-Reference Image Quality Assessment (NR-IQA) by measuring the cosine similarity between the image embedding and textual prompts such as "a good photo" or "a bad photo." However, this semantic similarity overlooks a critical yet underexplored cue: the magnitude of the CLIP image features, which we empirically find to exhibit a strong correlation with perceptual quality. In this work, we introduce a novel adaptive fusion framework that complements cosine similarity with a magnitude-aware quality cue. Specifically, we first extract the absolute CLIP image features and apply a Box-Cox transformation to statistically normalize the feature distribution and mitigate semantic sensitivity. The resulting scalar summary serves as a semantically-normalized auxiliary cue that complements cosine-based prompt matching. To integrate both cues effectively, we further design a confidence-guided fusion scheme that adaptively weighs each term according to its relative strength. Extensive experiments on multiple benchmark IQA datasets demonstrate that our method consistently outperforms standard CLIP-based IQA and state-of-the-art baselines, without any task-specific training.

Downloads

Published

2026-03-14

How to Cite

Liao, Z., Wu, D., Shi, Z., Mai, S., Zhu, H., Zhu, L., … Chen, B. (2026). Beyond Cosine Similarity: Magnitude-Aware CLIP for No-Reference Image Quality Assessment. Proceedings of the AAAI Conference on Artificial Intelligence, 40(9), 6934–6942. https://doi.org/10.1609/aaai.v40i9.37627

Issue

Section

AAAI Technical Track on Computer Vision VI