PQDA:Policy-Aligned Q-Consistency Meets Decoupled Augmentation for Generalizable Visual RL

Authors

  • Yun Zhou School of Artificial Intelligence, Anhui University, Hefei, China Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China Anhui Provincial Key Laboratory of Security Artificial Intelligence, Anhui University, Hefei, China
  • Yuqiang Wu School of Artificial Intelligence, Anhui University, Hefei, China
  • Chunyu Tan School of Artificial Intelligence, Anhui University, Hefei, China Anhui Provincial Key Laboratory of Security Artificial Intelligence, Anhui University, Hefei, China

DOI:

https://doi.org/10.1609/aaai.v40i34.40145

Abstract

A fundamental challenge in visual reinforcement learning (RL) is achieving robust generalization across environments with varying visual distractions. Current RL methods struggle with generalization due to their inability to differentiate foreground and background features during augmentation,while their Q-consistency mechanisms rely on outdated actions from replay buffers that drift from the current policy.In this paper, we present PQDA, a novel framework that addresses generalization challenges in RL through two key innovations: (1) Foreground-Background Decoupled Augmentation leverages Gaussian mixture model-based segmentation to efficiently generate and cache masks in replay buffers, applying differentiated augmentation strategies to foreground and background regions, thereby enhancing data diversity while maintaining task-relevant features. (2) Policy-Aligned Q-Consistency enforces policy alignment by sampling actions from the current policy for Q-regularization, achieving faster and more stable convergence. Notably, PQDA eliminates auxiliary tasks entirely through a unified architecture that co-optimizes the encoder and RL components directly. Extensive experiments on DMControl benchmarks (including our newly proposed CVDMC benchmark) and robotic manipulation tasks demonstrate PQDA's superior generalization performance, outperforming state-of-the-art methods.

Published

2026-03-14

How to Cite

Zhou, Y., Wu, Y., & Tan, C. (2026). PQDA:Policy-Aligned Q-Consistency Meets Decoupled Augmentation for Generalizable Visual RL. Proceedings of the AAAI Conference on Artificial Intelligence, 40(34), 29080–29088. https://doi.org/10.1609/aaai.v40i34.40145

Issue

Section

AAAI Technical Track on Machine Learning XI