Reinforce Trustworthiness in Multimodal Emotional Support System

Huy M. Le; Dat Tien Nguyen; Ngan T. T. Vo; Tuan D. Q. Nguyen; Nguyen Le Binh; Duy Minh Ho Nguyen; Daniel Sonntag; Lizi Liao; Binh T. Nguyen

doi:10.1609/aaai.v40i37.40412

Authors

Huy M. Le Mohamed bin Zayed University of Artificial Intelligence University of Information Technology, Vietnam National University
Dat Tien Nguyen Mohamed bin Zayed University of Artificial Intelligence University of Information Technology, Vietnam National University
Ngan T. T. Vo University of Information Technology, Vietnam National University
Tuan D. Q. Nguyen University of Information Technology, Vietnam National University
Nguyen Le Binh University of Information Technology, Vietnam National University
Duy Minh Ho Nguyen German Research Center for Artificial Intelligence (DFKI) University of Stuttgart Max Planck Research School for Intelligent Systems (IMPRS-IS)
Daniel Sonntag German Research Center for Artificial Intelligence (DFKI) University of Oldenburg
Lizi Liao Singapore Management University
Binh T. Nguyen University of Science, Vietnam National University

DOI:

https://doi.org/10.1609/aaai.v40i37.40412

Abstract

In today’s world, emotional support is increasingly essential, yet it remains challenging for both those seeking help and those offering it. Multimodal approaches to emotional support show great promise by integrating diverse data sources to provide empathetic, contextually relevant responses, fostering more effective interactions. However, current methods have notable limitations, often relying solely on text or converting other data types into text, or providing emotion recognition only, thus overlooking the full potential of multimodal inputs. Moreover, many studies prioritize response generation without accurately identifying critical emotional support elements or ensuring the reliability of outputs. To overcome these issues, we introduce MULTIMOOD, a new framework that (i) leverages multimodal embeddings from video, audio, and text to predict emotional components and to produce responses responses aligned with professional therapeutic standards. To improve trustworthiness, we (ii) incorporate novel psychological criteria and apply Reinforcement Learning (RL) to optimize large language models (LLMs) for consistent adherence to these standards. We also (iii) analyze several advanced LLMs to assess their multimodal emotional support capabilities. Experimental results show that MultiMood achieves state-of-the-art on MESC and DFEW datasets while RL-driven trustworthiness improvements are validated through human and LLM evaluations, demonstrating its superior capability in applying a multimodal framework in this domain.

Reinforce Trustworthiness in Multimodal Emotional Support System

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information