SageLM: A Multi-aspect and Explainable Large Language Model for Speech Judgement

Yuan Ge; Junxiang Zhang; Xiaoqian Liu; Bei Li; Xiangnan Ma; Chenglong Wang; Kaiyang Ye; Yangfan Du; Linfeng Zhang; Yuxin Huang; Tong Xiao; Zhengtao Yu; Jingbo Zhu

doi:10.1609/aaai.v40i36.40338

Authors

Yuan Ge Northeastern University
Junxiang Zhang Northeastern University
Xiaoqian Liu Northeastern University
Bei Li Meituan
Xiangnan Ma Northeastern University
Chenglong Wang Northeastern University
Kaiyang Ye Northeastern University
Yangfan Du Northeastern University
Linfeng Zhang Shanghai Jiaotong University
Yuxin Huang Kunming University of Science and Technology
Tong Xiao Northeastern University NiuTrans Research
Zhengtao Yu Kunming University of Science and Technology
Jingbo Zhu Northeastern University NiuTrans Research

DOI:

https://doi.org/10.1609/aaai.v40i36.40338

Abstract

Speech-to-Speech (S2S) Large Language Models (LLMs) are foundational to natural human-computer interaction, enabling end-to-end spoken dialogue systems. However, evaluating these models remains a fundamental challenge. We propose SageLM, an end-to-end, multi-aspect, and explainable speech LLM for comprehensive S2S LLMs evaluation. First, unlike cascaded approaches that disregard acoustic features, SageLM jointly assesses both semantic and acoustic dimensions. Second, it leverages rationale-based supervision to enhance explainability and guide model learning, achieving superior alignment with evaluation outcomes compared to rule-based reinforcement learning methods. Third, we introduce SpeechFeedback, a synthetic preference dataset, and employ a two-stage training paradigm to mitigate the scarcity of speech preference data. Trained on both semantic and acoustic dimensions, SageLM achieves an 82.79% agreement rate with human evaluators, outperforming cascaded and SLM-based baselines by at least 7.42% and 26.20%, respectively.

SageLM: A Multi-aspect and Explainable Large Language Model for Speech Judgement

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information