DSAP: Enhancing Generalization in Goal-Conditioned Reinforcement Learning
DOI:
https://doi.org/10.1609/aaai.v40i31.39877Abstract
Goal-conditioned Reinforcement Learning (RL) is a promising direction for training agents capable of tackling a variety of tasks. However, generalizing to new goals in different environments remains a central challenge for goal-conditioned RL agents. Existing methods often rely on state abstraction, which involves learning abstracted state representations by excluding irrelevant features, to improve generalization. Despite their success in simplified settings, these methods often fail to generalize effectively to realistic environments with varied goals. In this work, we propose to enhance generalization through state abstraction from the perspective of causal inference. We hypothesize that the generalization gap arises in part due to unobserved confounders: latent variables that simultaneously influence both the global and goal states. To address this, we introduce Deconfounded State Abstraction for Policy learning (DSAP), a novel framework that mitigates backdoor confounding by employing a learned causal graph as a *proxy* for the hidden confounders. We provide theoretical analysis demonstrating that DSAP improves both the learning process and the generalization capability of goal-conditioned policies. Extensive experiments across different settings of multiple benchmarks show that our method significantly outperforms existing methods.Downloads
Published
2026-03-14
How to Cite
Wang, Y., Zhao, K., Yang, M., Li, Y., Liu, F., Chen, J., & U, L. H. (2026). DSAP: Enhancing Generalization in Goal-Conditioned Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(31), 26679–26687. https://doi.org/10.1609/aaai.v40i31.39877
Issue
Section
AAAI Technical Track on Machine Learning VIII