DiffusionREC: Diffusion Model with Adaptive Condition for Referring Expression Comprehension

Jingcheng Ke; Waikeung Wong; Jia Wang; Mu Li; Lunke Fei; Jie Wen

doi:10.1609/aaai.v39i4.32443

Authors

Jingcheng Ke Guangdong University of Technology, Guangzhou, China
Waikeung Wong School of Fashion and Textiles, Hong Kong, The Hong Kong Polytechnic University, Hong Kong
Jia Wang Guangdong Pharmaceutical University, Guangzhou, China
Mu Li Harbin Institute of Technology, Shenzhen, China
Lunke Fei Guangdong University of Technology, Guangzhou, China
Jie Wen Harbin Institute of Technology, Shenzhen, China

DOI:

https://doi.org/10.1609/aaai.v39i4.32443

Abstract

The objective of referring expression comprehension (REC) is to accurately identify the object in an image described by a given expression. Existing REC methods, including transformer-based and graph-based approaches among others, have shown robust performance in REC tasks. In this study, we present a groundbreaking framework named DiffusionREC for REC task. This framework reimagines the REC task as a text guided bounding box denoising diffusion process, through which noisy bounding boxes are refined and distilled to pinpoint the target box. Throughout the training process, the bounding box of the target object diffuses from its ground-truth position towards a random distribution. Simultaneously, a filtering-based object decoder is introduced to reverse this diffusion of noise, conditional on the provided expression, the result from previous denoised step and the interaction between the expression and the image. At the inference stage, we begin by randomly generating a collection of boxes. Subsequently, the filtering-based object decoder is iteratively employed to refine and prune these bounding boxes, taking into account the conditions on the given expression, the results from the previous denoised step, and the interaction between the expression and the image. Extensive experiments conducted on six datasets demonstrate that DiffusionREC outperforms previous REC methods, yielding superior performances.

DiffusionREC: Diffusion Model with Adaptive Condition for Referring Expression Comprehension

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information