[1]

Y. Zhan, Y. Yuan, and Z. Xiong, “Mono3DVG: 3D Visual Grounding in Monocular Images”, AAAI, vol. 38, no. 7, pp. 6988–6996, Mar. 2024.