[1]
Y. Li, “Mono3DVG-EnSD: Enhanced Spatial-aware and Dimension-decoupled Text Encoding for Monocular 3D Visual Grounding”, AAAI, vol. 40, no. 8, pp. 6726–6734, Mar. 2026.