Multi-Grained Query-Guided Set Prediction Network for Grounded Multimodal Named Entity Recognition
DOI:
https://doi.org/10.1609/aaai.v39i24.34711Abstract
Grounded Multimodal Named Entity Recognition (GMNER) is an emerging information extraction (IE) task, aiming to simultaneously extract entity spans, types, and corresponding visual regions of entities from given sentence-image pairs data. Recent unified methods employing machine reading comprehension or sequence generation-based frameworks show limitations in this difficult task. The former, utilizing human-designed type queries, struggles to differentiate ambiguous entities, such as Jordan (Person) and off-White x Jordan (Shoes). The latter, following the one-by-one decoding order, suffers from exposure bias issues. We maintain that these works misunderstand the relationships of multimodal entities. To tackle these, we propose a novel unified framework named Multi-grained Query-guided Set Prediction Network (MQSPN) to learn appropriate relationships at intra-entity and inter-entity levels. Specifically, MQSPN explicitly aligns textual entities with visual regions by employing a set of learnable queries to strengthen intra-entity connections. Based on distinct intra-entity modeling, MQSPN reformulates GMNER as a set prediction, guiding models to establish appropriate inter-entity relationships from a optimal global matching perspective. Additionally, we incorporate a query-guided Fusion Net (QFNet) as a glue network to boost better alignment of two-level relationships. Extensive experiments demonstrate that our approach achieves state-of-the-art performances in widely used benchmarks.Published
2025-04-11
How to Cite
Tang, J., Wang, Z., Gong, Z., Yu, J., Zhu, X., & Yin, J. (2025). Multi-Grained Query-Guided Set Prediction Network for Grounded Multimodal Named Entity Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 39(24), 25246–25254. https://doi.org/10.1609/aaai.v39i24.34711
Issue
Section
AAAI Technical Track on Natural Language Processing III