Perceiving the Knowledge Boundary: Uncertainty-Guided Exploration and Imagination for World Models

Zhenxian Liu; Peixi Peng; Yangru Huang; Yonghong Tian

doi:10.1609/aaai.v40i28.39576

Authors

Zhenxian Liu National Engineering Research Center of Visual Technology, School of Computer Science, Peking University, China
Peixi Peng School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, China Peng Cheng Laboratory, China
Yangru Huang National Engineering Research Center of Visual Technology, School of Computer Science, Peking University, China
Yonghong Tian National Engineering Research Center of Visual Technology, School of Computer Science, Peking University, China School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, China Peng Cheng Laboratory, China

DOI:

https://doi.org/10.1609/aaai.v40i28.39576

Abstract

World-model-based reinforcement learning achieves high sample efficiency by learning from imagined rollouts. However, its success critically depends on the accuracy of the learned world model, which is prone to producing unrealistic or hallucinated rollouts when queried beyond its domain of competence. These flawed predictions can trap the agent in a vicious cycle: by misleading exploration toward implausible or uninformative regions, they degrade the quality of collected data, which in turn corrupts policy learning with inaccurate rollouts. To break this cycle, we introduce the notion of a knowledge boundary—the region within which the world model provides reliable predictions—and propose a unified framework that both identifies and leverages this boundary. Concretely, we approximate the boundary using model uncertainty, quantified via disagreement across an ensemble of lightweight predictors, which serves as a practical proxy. This uncertainty signal is used in two complementary ways: as an intrinsic reward to guide exploration toward under-explored yet learnable regions, and as a dynamic filter to exclude unreliable imagined rollouts from policy optimization. Extensive experiments across diverse benchmarks—including CARLA, DeepMind Control Suite, Atari, and MemoryMaze—demonstrate that our approach consistently outperforms prior state-of-the-art methods.

Perceiving the Knowledge Boundary: Uncertainty-Guided Exploration and Imagination for World Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information