KnowPO: Knowledge-Aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models

Authors

  • Ruizhe Zhang School of Computer Science, Peking University, Beijing, China Key Laboratory of High Confidence Software Technologies, Ministry of Education, Beijing, China
  • Yongxin Xu School of Computer Science, Peking University, Beijing, China Key Laboratory of High Confidence Software Technologies, Ministry of Education, Beijing, China
  • Yuzhen Xiao School of Computer Science, Peking University, Beijing, China Key Laboratory of High Confidence Software Technologies, Ministry of Education, Beijing, China
  • Runchuan Zhu School of Computer Science, Peking University, Beijing, China Key Laboratory of High Confidence Software Technologies, Ministry of Education, Beijing, China
  • Xinke Jiang School of Computer Science, Peking University, Beijing, China Key Laboratory of High Confidence Software Technologies, Ministry of Education, Beijing, China
  • Xu Chu School of Computer Science, Peking University, Beijing, China Key Laboratory of High Confidence Software Technologies, Ministry of Education, Beijing, China Center on Frontiers of Computing Studies, Peking University, Beijing, China Peking University Information Technology Institute (Tianjin Binhai)
  • Junfeng Zhao School of Computer Science, Peking University, Beijing, China Key Laboratory of High Confidence Software Technologies, Ministry of Education, Beijing, China Nanhu Laboratory, Jiaxing, China
  • Yasha Wang Key Laboratory of High Confidence Software Technologies, Ministry of Education, Beijing, China National Engineering Research Center For Software Engineering, Peking University, Beijing, China Peking University Information Technology Institute (Tianjin Binhai)

DOI:

https://doi.org/10.1609/aaai.v39i24.34783

Abstract

By integrating external knowledge, Retrieval-Augmented Generation (RAG) has become an effective strategy for mitigating the hallucination problems that large language models (LLMs) encounter when dealing with knowledge-intensive tasks. However, in the process of integrating external non-parametric supporting evidence with internal parametric knowledge, inevitable knowledge conflicts may arise, leading to confusion in the model's responses. To enhance the knowledge selection of LLMs in various contexts, some research has focused on refining their behavior patterns through instruction-tuning. Nonetheless, due to the absence of explicit negative signals and comparative objectives, models fine-tuned in this manner may still exhibit undesirable behaviors such as contextual ignorance and contextual overinclusion. To this end, we propose a Knowledge-aware Preference Optimization strategy, dubbed KnowPO, aimed at achieving adaptive knowledge selection based on contextual relevance in real retrieval scenarios. Concretely, we proposed a general paradigm for constructing knowledge conflict datasets, which comprehensively cover various error types and learn how to avoid these negative signals through preference optimization methods. Simultaneously, we proposed a rewriting strategy and data ratio optimization strategy to address preference imbalances. Experimental results show that KnowPO outperforms previous methods for handling knowledge conflicts by over 37%, while also exhibiting robust generalization across various out-of-distribution datasets.

Published

2025-04-11

How to Cite

Zhang, R., Xu, Y., Xiao, Y., Zhu, R., Jiang, X., Chu, X., … Wang, Y. (2025). KnowPO: Knowledge-Aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 39(24), 25895–25903. https://doi.org/10.1609/aaai.v39i24.34783

Issue

Section

AAAI Technical Track on Natural Language Processing III