[1]
S. Zheng, “Look Around Before Locating: Considering Content and Structure Information for Visual Grounding”, AAAI, vol. 39, no. 2, pp. 1656–1664, Apr. 2025.