[1]
H. Huang, Y. Yang, H. Sun, J. Li, and Y. Gao, “Simulated Rewards, Skewed Strategies: Tracing the Acquired Preference Bias in LLM-Based Dialogue Planners”, AAAI, vol. 40, no. 26, pp. 21948–21956, Mar. 2026.