Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension

Authors

  • Lin Li The Hong Kong University of Science and Technology AI Chip Center for Emerging Smart Systems
  • Wei Chen The Hong Kong University of Science and Technology
  • Jiahui Li Zhejiang University
  • Kwang-Ting Cheng The Hong Kong University of Science and Technology AI Chip Center for Emerging Smart Systems
  • Long Chen The Hong Kong University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v40i8.37557

Abstract

Recent advances in multi-modal large language models (MLLMs) have significantly improved object-level grounding and region captioning. However, they remain limited in visual relation understanding, struggling even with binary relation detection, let alone N-ary relations involving multiple semantic roles. The core reason is the lack of modeling for structural semantic dependencies among multi-entities, leading to over-reliance on language priors (e.g., defaulting to "person drinks a milk" if a person is merely holding it). To this end, we propose Relation-R1, the first unified relation comprehension framework that explicitly integrates cognitive chain-of-thought (CoT)-guided supervised fine-tuning (SFT) and group relative policy optimization (GRPO) within a reinforcement learning (RL) paradigm. Specifically, we first establish foundational reasoning capabilities via SFT, enforcing structured outputs with thinking processes. Then, GRPO is utilized to refine these outputs via multi-rewards optimization, prioritizing visual-semantic grounding over language-induced biases, thereby improving generalization capability. Furthermore, we investigate the impact of various CoT strategies within this framework, demonstrating that a specific-to-general progressive approach in CoT guidance further improves generalization, especially in capturing synonymous N-ary relations. Extensive experiments on widely-used PSG and SWiG datasets demonstrate that Relation-R1 achieves state-of-the-art performance in both binary and N-ary relation understanding.

Published

2026-03-14

How to Cite

Li, L., Chen, W., Li, J., Cheng, K.-T., & Chen, L. (2026). Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension. Proceedings of the AAAI Conference on Artificial Intelligence, 40(8), 6306–6314. https://doi.org/10.1609/aaai.v40i8.37557

Issue

Section

AAAI Technical Track on Computer Vision V