CultureRL: Internalizing Cultural Principles in Large Language Models via Norm-Driven Reinforcement Learning

Authors

  • Weixiang Zhao Harbin Institute of Technology
  • Haozhen Li Harbin Institute of Technology
  • Yanyan Zhao Harbin Institute of Technology
  • Haixiao Liu Du Xiaoman Financial
  • Biye Li Du Xiaoman Financial
  • Ting Liu Harbin Institute of Technology
  • Bing Qin Harbin Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v40i44.41150

Abstract

As large language models (LLMs) are increasingly deployed across culturally diverse regions, ensuring that their responses align with users’ cultural norms has become a critical challenge. Existing approaches to cultural alignment primarily rely on prompting or data-augmentation-based supervised finetuning, which teach models to follow norms indirectly through example-based supervision. However, these methods are difficult to scale and often fail to generalize, particularly in low-resource cultural settings. In this work, we propose CultureRL, a culture-norm-driven reinforcement learning framework that directly encodes cultural principles into model behavior. Rather than relying on output imitation, CultureRL provides normative feedback during training, enabling the model to internalize high-level cultural rules. It consists of two key components: (1) Norm Pool Construction (NPC), which clusters data from the World Values Survey into abstract cultural concepts to form a structured and retrievable norm pool; and (2) Norm Cluster-based Reward Mechanism (NCRM), which retrieves the relevant norm for each input and uses an external reward model to assess conformity, guiding model updates through cultural alignment. We evaluate CultureRL in both one-for-one (per-culture) and one-for-all (multi-culture) settings across nine cultures and three benchmarks. Results show that CultureRL consistently outperforms strong baselines, especially in terms of cultural consistency and adaptability.

Downloads

Published

2026-03-14

How to Cite

Zhao, W., Li, H., Zhao, Y., Liu, H., Li, B., Liu, T., & Qin, B. (2026). CultureRL: Internalizing Cultural Principles in Large Language Models via Norm-Driven Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(44), 38120-38128. https://doi.org/10.1609/aaai.v40i44.41150

Issue

Section

AAAI Special Track on AI Alignment