SHADOW: Dynamic-Aware Credit Assignment Against Long-Horizon Tasks

Authors

  • Yuze Liu Zhejiang University Shanghai Artificial Intelligence Laboratory
  • Chaochao Lu Shanghai Artificial Intelligence Laboratory
  • Chao Yang Shanghai Artificial Intelligence Laboratory

DOI:

https://doi.org/10.1609/aaai.v40i28.39570

Abstract

Reinforcement learning (RL) has emerged as the predominant paradigm for training large language model (LLM) agents to solve complex, multi-step tasks through environmental interaction. A fundamental challenge in such long-horizon scenarios is credit assignment, as delayed rewards provide inadequate signals for evaluating individual action contributions. Existing methods typically neglect trajectory transition dynamics, which leads to coarse-grained or biased credit assignment. To address these limitations, we introduce SHADOW, a novel framework that systematically incorporates transition dynamics for improved credit assignment. Our framework makes two primary contributions: (i) a dynamics-aware state grouping mechanism that mitigates misleading action comparisons between dynamically inconsistent states, and (ii) a local dynamic advantage estimator that leverages Generalized Advantage Estimation (GAE) to precisely quantify individual action contributions through a fine-grained analysis of transition patterns. Comprehensive experiments conducted with the Qwen2.5-1.5/7B-Instruct agent model demonstrate that our method achieves success rate improvements of 9.4%/7.6% on the ALFworld benchmark and a performance gain of over 5% on WebShop.

Published

2026-03-14

How to Cite

Liu, Y., Lu, C., & Yang, C. (2026). SHADOW: Dynamic-Aware Credit Assignment Against Long-Horizon Tasks. Proceedings of the AAAI Conference on Artificial Intelligence, 40(28), 23935–23944. https://doi.org/10.1609/aaai.v40i28.39570

Issue

Section

AAAI Technical Track on Machine Learning V