SHADOW: Dynamic-Aware Credit Assignment Against Long-Horizon Tasks

Yuze Liu; Chaochao Lu; Chao Yang

doi:10.1609/aaai.v40i28.39570

Authors

Yuze Liu Zhejiang University Shanghai Artificial Intelligence Laboratory
Chaochao Lu Shanghai Artificial Intelligence Laboratory
Chao Yang Shanghai Artificial Intelligence Laboratory

DOI:

https://doi.org/10.1609/aaai.v40i28.39570

Abstract

Reinforcement learning (RL) has emerged as the predominant paradigm for training large language model (LLM) agents to solve complex, multi-step tasks through environmental interaction. A fundamental challenge in such long-horizon scenarios is credit assignment, as delayed rewards provide inadequate signals for evaluating individual action contributions. Existing methods typically neglect trajectory transition dynamics, which leads to coarse-grained or biased credit assignment. To address these limitations, we introduce SHADOW, a novel framework that systematically incorporates transition dynamics for improved credit assignment. Our framework makes two primary contributions: (i) a dynamics-aware state grouping mechanism that mitigates misleading action comparisons between dynamically inconsistent states, and (ii) a local dynamic advantage estimator that leverages Generalized Advantage Estimation (GAE) to precisely quantify individual action contributions through a fine-grained analysis of transition patterns. Comprehensive experiments conducted with the Qwen2.5-1.5/7B-Instruct agent model demonstrate that our method achieves success rate improvements of 9.4%/7.6% on the ALFworld benchmark and a performance gain of over 5% on WebShop.

SHADOW: Dynamic-Aware Credit Assignment Against Long-Horizon Tasks

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information