Zhan, Yang, et al. “Mono3DVG: 3D Visual Grounding in Monocular Images”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 7, Mar. 2024, pp. 6988-96, doi:10.1609/aaai.v38i7.28525.