Huang, H., Yang, Y., Sun, H., Li, J., & Gao, Y. (2026). Simulated Rewards, Skewed Strategies: Tracing the Acquired Preference Bias in LLM-Based Dialogue Planners. Proceedings of the AAAI Conference on Artificial Intelligence, 40(26), 21948–21956. https://doi.org/10.1609/aaai.v40i26.39348