Universal Post-Processing Networks for Joint Optimization of Modules in Task-Oriented Dialogue Systems
DOI:
https://doi.org/10.1609/aaai.v39i23.34681Abstract
Post-processing networks (PPNs) are components that modify the outputs of arbitrary modules in task-oriented dialogue systems and are optimized using reinforcement learning (RL) to improve the overall task completion capability of the system. However, previous PPN-based approaches have been limited to handling only a subset of modules within a system, which poses a significant limitation in improving the system performance. In this study, we propose a joint optimization method for post-processing the outputs of all modules using universal post-processing networks (UniPPNs), which are language-model-based networks that can modify the outputs of arbitrary modules in a system as a sequence-transformation task. Moreover, our RL algorithm, which employs a module-level Markov decision process, enables fine-grained value and advantage estimation for each module, thereby stabilizing joint learning for post-processing the outputs of all modules. Through both simulation-based and human evaluation experiments using the MultiWOZ dataset, we demonstrated that UniPPN outperforms conventional PPNs in the task completion capability of task-oriented dialogue systems.Published
2025-04-11
How to Cite
Ohashi, A., & Higashinaka, R. (2025). Universal Post-Processing Networks for Joint Optimization of Modules in Task-Oriented Dialogue Systems. Proceedings of the AAAI Conference on Artificial Intelligence, 39(23), 24975–24983. https://doi.org/10.1609/aaai.v39i23.34681
Issue
Section
AAAI Technical Track on Natural Language Processing II