Efficient Preference Alignment via Pareto Exploration (Student Abstract)

Pengfei Liu; Rui Kong; Zongzhang Zhang

doi:10.1609/aaai.v40i48.42242

Authors

Pengfei Liu National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China School of Artificial Intelligence, Nanjing University, Nanjing 210023, China
Rui Kong National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China School of Artificial Intelligence, Nanjing University, Nanjing 210023, China
Zongzhang Zhang National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China School of Artificial Intelligence, Nanjing University, Nanjing 210023, China

DOI:

https://doi.org/10.1609/aaai.v40i48.42242

Abstract

Hand-craft reward engineering requires domain knowledge with numerous trials and errors, while Preference-based Reinforcement Learning (PbRL) avoids manual reward design but often suffers from limited interpretability and unstable training. To address these issues, we propose a novel preference alignment framework. Our approach leverages large language models to generate sub-reward functions informed by prior knowledge and further align human preferences by optimizing the weights combining these sub-rewards. For policy learning, we introduce Policy Optimization via Pareto Regularization (POPR) which regularizes updates along Pareto-optimal directions. Experiments show that our framework improves reward quality and policy stability, achieving superior performance to expert-designed rewards across most tasks.

Efficient Preference Alignment via Pareto Exploration (Student Abstract)

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information