Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension

Lin Li; Wei Chen; Jiahui Li; Kwang-Ting Cheng; Long Chen

doi:10.1609/aaai.v40i8.37557

Authors

Lin Li The Hong Kong University of Science and Technology AI Chip Center for Emerging Smart Systems
Wei Chen The Hong Kong University of Science and Technology
Jiahui Li Zhejiang University
Kwang-Ting Cheng The Hong Kong University of Science and Technology AI Chip Center for Emerging Smart Systems
Long Chen The Hong Kong University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v40i8.37557

Abstract

Recent advances in multi-modal large language models (MLLMs) have significantly improved object-level grounding and region captioning. However, they remain limited in visual relation understanding, struggling even with binary relation detection, let alone N-ary relations involving multiple semantic roles. The core reason is the lack of modeling for structural semantic dependencies among multi-entities, leading to over-reliance on language priors (e.g., defaulting to "person drinks a milk" if a person is merely holding it). To this end, we propose Relation-R1, the first unified relation comprehension framework that explicitly integrates cognitive chain-of-thought (CoT)-guided supervised fine-tuning (SFT) and group relative policy optimization (GRPO) within a reinforcement learning (RL) paradigm. Specifically, we first establish foundational reasoning capabilities via SFT, enforcing structured outputs with thinking processes. Then, GRPO is utilized to refine these outputs via multi-rewards optimization, prioritizing visual-semantic grounding over language-induced biases, thereby improving generalization capability. Furthermore, we investigate the impact of various CoT strategies within this framework, demonstrating that a specific-to-general progressive approach in CoT guidance further improves generalization, especially in capturing synonymous N-ary relations. Extensive experiments on widely-used PSG and SWiG datasets demonstrate that Relation-R1 achieves state-of-the-art performance in both binary and N-ary relation understanding.

Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information