Reasoning via Implicit Self-supervised Emergence for Instruction Segmentation

Qing Zhou; Lichang Yang; Yuyu Jia; Junyu Gao; Weiping Ni; Junzheng Wu; Qi Wang

doi:10.1609/aaai.v40i16.38382

Authors

Qing Zhou Northwestern Polytechnical University
Lichang Yang Northwestern Polytechnical University
Yuyu Jia Northwestern Polytechnical University
Junyu Gao Northwestern Polytechnical University
Weiping Ni Northwest Institute of Nuclear technology
Junzheng Wu Northwest Institute of Nuclear technology
Qi Wang Northwestern Polytechnical University

DOI:

https://doi.org/10.1609/aaai.v40i16.38382

Abstract

We challenge the assumption that complex instruction-guided segmentation tasks necessitate equally complex and explicit supervision. This paper introduces RISE (Reasoning via Implicit Self-supervised Emergence), a framework that learns intricate compositional reasoning, spanning spatial relations to world knowledge, without a single ground-truth mask. To achieve this, RISE employs reinforcement learning with GRPO guided by a single, strikingly simple reward: the semantic alignment score between the textual instruction and the predicted image region. Our primary discovery is the implicit emergence of a high-quality chain-of-thought process from this minimalist signal. Within a structured format, the model autonomously learns to understand instructions by accessing its latent knowledge, inferring spatial relationships—capabilities inherent in its architecture but unlocked by our simple objective. Remarkably, our emergent reasoning yields highly competitive results: RISE achieves 58.7 gIoU on the ReasonSeg benchmark, on par with methods using geometric rewards. Furthermore, we show extreme data efficiency: a variant trained on only 2,000 ImageNet-label pairs establishes a new state-of-the-art for annotation-free referring segmentation with 79.6 cIoU on RefCOCO.

Reasoning via Implicit Self-supervised Emergence for Instruction Segmentation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information