Bridging the Gap between Expression and Scene Text for Referring Expression Comprehension (Student Abstract)

Yuqi Bu; Jiayuan Xie; Liuwu Li; Qiong Liu; Yi Cai

doi:10.1609/aaai.v36i11.21597

Bridging the Gap between Expression and Scene Text for Referring Expression Comprehension (Student Abstract)

Authors

Yuqi Bu South China University of Technology
Jiayuan Xie South China University of Technology
Liuwu Li South China University of Technology
Qiong Liu South China University of Technology
Yi Cai South China University of Technology

DOI:

https://doi.org/10.1609/aaai.v36i11.21597

Keywords:

Referring Expression Comprehension, Scene Text, Multi-modal Alignment

Abstract

Referring expression comprehension aims at grounding the object in an image referred to by the expression. Scene text that serves as an identifier has a natural advantage in referring to objects. However, existing methods only consider the text in the expression, but ignore the text in the image, leading to a mismatch. In this paper, we propose a novel model that can recognize the scene text. We assign the extracted scene text to its corresponding visual region and ground the target object guided by expression. Experimental results on two benchmarks demonstrate the effectiveness of our model.

Downloads

Published

2022-06-28

How to Cite

Bu, Y., Xie, J., Li, L., Liu, Q., & Cai, Y. (2022). Bridging the Gap between Expression and Scene Text for Referring Expression Comprehension (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 36(11), 12921–12922. https://doi.org/10.1609/aaai.v36i11.21597

Download Citation

Issue

Vol. 36 No. 11: IAAI-22, EAAI-22, AAAI-22 Special Programs and Special Track, Student Papers and Demonstrations

Section

AAAI Student Abstract and Poster Program

Bridging the Gap between Expression and Scene Text for Referring Expression Comprehension (Student Abstract)

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information