SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph

Yuxing Long; Binyuan Hui; Fulong Ye; Yanyang Li; Zhuoxin Han; Caixia Yuan; Yongbin Li; Xiaojie Wang

doi:10.1609/aaai.v37i11.26562

Authors

Yuxing Long Beijing University of Posts and Telecommunications
Binyuan Hui Independent Researcher
Fulong Ye Beijing University of Posts and Telecommunications
Yanyang Li Independent Researcher
Zhuoxin Han Beijing University of Posts and Telecommunications
Caixia Yuan Beijing University of Posts and Telecommunications
Yongbin Li Independent Researcher
Xiaojie Wang Beijing University of Posts and Telecommunications

DOI:

https://doi.org/10.1609/aaai.v37i11.26562

Keywords:

SNLP: Conversational AI/Dialogue Systems, CV: Multi-modal Vision, SNLP: Generation, SNLP: Language Models, SNLP: Question Answering

Abstract

Existing multimodal conversation agents have shown impressive abilities to locate absolute positions or retrieve attributes in simple scenarios, but they fail to perform well when complex relative positions and information alignments are involved, which poses a bottleneck in response quality. In this paper, we propose a Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph (SPRING) with abilities of reasoning multi-hops spatial relations and connecting them with visual attributes in crowded situated scenarios. Specifically, we design two types of Multimodal Question Answering (MQA) tasks to pretrain the agent. All QA pairs utilized during pretraining are generated from novel Increment Layout Graphs (ILG). QA pair difficulty labels automatically annotated by ILG are used to promote MQA-based Curriculum Learning. Experimental results verify the SPRING's effectiveness, showing that it significantly outperforms state-of-the-art approaches on both SIMMC 1.0 and SIMMC 2.0 datasets. We release our code and data at https://github.com/LYX0501/SPRING.

SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription