Reinforcement Learning Without Explicit Rewards: Theory and Practice

Authors

  • Weitong Zhang University of North Carolina at Chapel Hill

DOI:

https://doi.org/10.1609/aaai.v40i47.41364

Abstract

In this New Faculty Highlights, I begin with the reward free exploration that learns broad state and skill coverage with intrinsic rewards and remains robust under misspecification during efficient finetuning; guided generation methods that preserve the prior policy and mitigate reward hacking; and AI for science and healthcare, including practical RL for autonomous laboratories and automatic diagnosis. Building on impacts evidenced by publications, adoption, and awards. My future work will pursue imitation learning and contextual multi task RL that connect behavioral cloning with interactive policies without explicit reward design; personalized and multi-tasked offline to online adaptation with in-context demonstrations. In parallel, I am broadening the impact of AI for science and healthcare through existing collaborations. I will close with a talk that surveys these results and outlines an agenda for reinforcement learning without explicit rewards.

Downloads

Published

2026-03-14

How to Cite

Zhang, W. (2026). Reinforcement Learning Without Explicit Rewards: Theory and Practice. Proceedings of the AAAI Conference on Artificial Intelligence, 40(47), 39847–39847. https://doi.org/10.1609/aaai.v40i47.41364