[1]
M. Dai, J. Li, J. Zhuang, X. Zhang, and W. Yang, “Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints”, AAAI, vol. 39, no. 3, pp. 2618–2626, Apr. 2025.