Zhan, Y., Yuan, Y., & Xiong, Z. (2024). Mono3DVG: 3D Visual Grounding in Monocular Images. Proceedings of the AAAI Conference on Artificial Intelligence, 38(7), 6988–6996. https://doi.org/10.1609/aaai.v38i7.28525