DialogXpert: Driving Intelligent and Emotion-Aware Conversations Through Online Value-Based Reinforcement Learning with LLM Priors

Authors

  • Tazeek Bin Abdur Rakib Monash University, Malaysia
  • Ambuj Mehrish Singapore University of Technology and Design
  • Lay-Ki Soon Monash University, Malaysia
  • Wern Han Lim Monash University, Malaysia
  • Soujanya Poria Nanyang Technological University

DOI:

https://doi.org/10.1609/aaai.v40i36.40244

Abstract

Large-language-model (LLM) agents excel at reactive dialogue but struggle with proactive, goal-driven interactions due to myopic decoding and costly planning. We introduce DialogXpert, which leverages a frozen LLM to propose a small, high-quality set of candidate actions per turn and employs a compact Q-network over fixed BERT embeddings trained via temporal-difference learning to select optimal moves within this reduced space. By tracking the user's emotions DialogXpert tailors each decision to advance the task while nurturing a genuine, empathetic connection. Across negotiation, emotional support, and tutoring benchmarks, DialogXpert drives conversations to under 3 turns with success rates exceeding 94% and, with a larger LLM prior, pushes success above 97% while markedly improving negotiation outcomes. This framework delivers real-time, strategic, and emotionally intelligent dialogue planning at scale.

Published

2026-03-14

How to Cite

Abdur Rakib, T. B., Mehrish, A., Soon, L.-K., Lim, W. H., & Poria, S. (2026). DialogXpert: Driving Intelligent and Emotion-Aware Conversations Through Online Value-Based Reinforcement Learning with LLM Priors. Proceedings of the AAAI Conference on Artificial Intelligence, 40(36), 29967–29975. https://doi.org/10.1609/aaai.v40i36.40244

Issue

Section

AAAI Technical Track on Natural Language Processing I