Reward-on-the-Line: A Novel Offline Reinforcement Learning Method for Building Legal Conversational Agents

Xubo Lin; Mingze Wang; Grace Hui Yang; Daniel Chen

doi:10.1609/aies.v8i2.36657

Authors

Xubo Lin Georgetown University, USA
Mingze Wang Georgetown University, USA
Grace Hui Yang Georgetown University, USA
Daniel Chen Université Toulouse Capitole, France

DOI:

https://doi.org/10.1609/aies.v8i2.36657

Abstract

Offline reinforcement learning (RL) offers a promising path for training domain-specific conversational agents (CAs) using large-scale historical dialogue data, without the need for costly online interactions or human annotations. In the legal domain, vast amounts of publicly available courtroom transcripts provide a rich and underutilized resource for developing intelligent legal CAs. However, offline training suffers from distribution shift between the learned policy and the behavior policy embedded in the training data, which can degrade agent performance at deployment. We address this challenge with a novel offline RL method, Reward-on-the-Line (ROL), which calibrates rewards based on action-selection agreement among an ensemble of CAs. We apply ROL to the U.S. Supreme Court dataset to demonstrate its effectiveness in learning proactive, legally-informed dialogue strategies from historical court proceedings. To show the broader applicability of our approach, we also evaluate ROL on the CraigslistBargain negotiation dataset. Results in both domains confirm that ROL reduces distribution shift and improves agent performance in unseen dialogue scenarios.

Reward-on-the-Line: A Novel Offline Reinforcement Learning Method for Building Legal Conversational Agents

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section