PQDA:Policy-Aligned Q-Consistency Meets Decoupled Augmentation for Generalizable Visual RL

Yun Zhou; Yuqiang Wu; Chunyu Tan

doi:10.1609/aaai.v40i34.40145

Authors

Yun Zhou School of Artificial Intelligence, Anhui University, Hefei, China Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China Anhui Provincial Key Laboratory of Security Artificial Intelligence, Anhui University, Hefei, China
Yuqiang Wu School of Artificial Intelligence, Anhui University, Hefei, China
Chunyu Tan School of Artificial Intelligence, Anhui University, Hefei, China Anhui Provincial Key Laboratory of Security Artificial Intelligence, Anhui University, Hefei, China

DOI:

https://doi.org/10.1609/aaai.v40i34.40145

Abstract

A fundamental challenge in visual reinforcement learning (RL) is achieving robust generalization across environments with varying visual distractions. Current RL methods struggle with generalization due to their inability to differentiate foreground and background features during augmentation,while their Q-consistency mechanisms rely on outdated actions from replay buffers that drift from the current policy.In this paper, we present PQDA, a novel framework that addresses generalization challenges in RL through two key innovations: (1) Foreground-Background Decoupled Augmentation leverages Gaussian mixture model-based segmentation to efficiently generate and cache masks in replay buffers, applying differentiated augmentation strategies to foreground and background regions, thereby enhancing data diversity while maintaining task-relevant features. (2) Policy-Aligned Q-Consistency enforces policy alignment by sampling actions from the current policy for Q-regularization, achieving faster and more stable convergence. Notably, PQDA eliminates auxiliary tasks entirely through a unified architecture that co-optimizes the encoder and RL components directly. Extensive experiments on DMControl benchmarks (including our newly proposed CVDMC benchmark) and robotic manipulation tasks demonstrate PQDA's superior generalization performance, outperforming state-of-the-art methods.

PQDA:Policy-Aligned Q-Consistency Meets Decoupled Augmentation for Generalizable Visual RL

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information