DMT-RoleBench: A Dynamic Multi-Turn Dialogue Based Benchmark for Role-Playing Evaluation of Large Language Model and Agent

Dingbo Yuan; Yipeng Chen; Guodong Liu; Chenchen Li; Chengfu Tang; Dongxu Zhang; Zhenkui Wang; Xudong Wang; Song Liu

doi:10.1609/aaai.v39i24.34768

Authors

Dingbo Yuan Ant Group
Yipeng Chen Ant Group
Guodong Liu Ant Group
Chenchen Li Ant Group
Chengfu Tang Ant Group
Dongxu Zhang Ant Group
Zhenkui Wang Ant Group
Xudong Wang Ant Group
Song Liu Ant Group

DOI:

https://doi.org/10.1609/aaai.v39i24.34768

Abstract

Recent years have witnessed a profound evolution in the abilities of Large Language Model, which has significantly boosted the proliferation of role-playing agents and platforms. Nonetheless, there is a conspicuous absence of systematic and comprehensive evaluations of role-playing abilities which are truly aligned with users' interaction scenarios in real-world. To address this gap, we have devised DMT-RoleBench, a benchmark designed to evaluate the role-playing abilities of large language models and agents based on dynamic multi-turn dialogues. Compared with existed role-playing benchmarks, DMT-RoleBench boasts several principal advantages: (1) It contains a more diverse role types and system prompts of different formats. (2) We propose an innovative evaluation paradigm to assess role-playing abilities based on dynamically generating multi-turn dialogues constrained by specific evaluation intents and topics, which is well aligned with users' interaction scenarios in real-world. (3) We define a three-tiered metric system and provide DMT-RM, which is a reward model aligned with human annotations, to annotate the dialogues. And we propose DMT-Score to calculate the final scores based on the annotated dialogues. Our experiments and analysis of leading models equipped with role-playing abilities have demonstrated the effectiveness of DMT-RoleBench.

DMT-RoleBench: A Dynamic Multi-Turn Dialogue Based Benchmark for Role-Playing Evaluation of Large Language Model and Agent

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information