Parallel Vertex Diffusion for Unified Visual Grounding
DOI:
https://doi.org/10.1609/aaai.v38i2.27896Keywords:
CV: Language and Vision, CV: Object Detection & Categorization, CV: SegmentationAbstract
Unified visual grounding (UVG) capitalizes on a wealth of task-related knowledge across various grounding tasks via one-shot training, which curtails retraining costs and task-specific architecture design efforts. Vertex generation-based UVG methods achieve this versatility by unified modeling object box and contour prediction and provide a text-powered interface to vast related multi-modal tasks, e.g., visual question answering and captioning. However, these methods typically generate vertexes sequentially through autoregression, which is prone to be trapped in error accumulation and heavy computation, especially for high-dimension sequence generation in complex scenarios. In this paper, we develop Parallel Vertex Diffusion (PVD) based on the parallelizability of diffusion models to accurately and efficiently generate vertexes in a parallel and scalable manner. Since the coordinates fluctuate greatly, it typically encounters slow convergence when training diffusion models without geometry constraints. Therefore, we consummate our PVD by two critical components, i.e., center anchor mechanism and angle summation loss, which serve to normalize coordinates and adopt a differentiable geometry descriptor from the point-in-polygon problem of computational geometry to constrain the overall difference of prediction and label vertexes. These innovative designs empower our PVD to demonstrate its superiority with state-of-the-art performance across various grounding tasks.Downloads
Published
2024-03-24
How to Cite
Cheng, Z., Li, K., Jin, P., Li, S., Ji, X., Yuan, L., Liu, C., & Chen, J. (2024). Parallel Vertex Diffusion for Unified Visual Grounding. Proceedings of the AAAI Conference on Artificial Intelligence, 38(2), 1326-1334. https://doi.org/10.1609/aaai.v38i2.27896
Issue
Section
AAAI Technical Track on Computer Vision I