ZHANG, Taolin; HE, Sunan; DAI, Tao; WANG, Zhi; CHEN, Bin; XIA, Shu-Tao. Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding. Proceedings of the AAAI Conference on Artificial Intelligence, [S. l.], v. 38, n. 7, p. 7296–7304, 2024. DOI: 10.1609/aaai.v38i7.28559. Disponível em: https://ojs.aaai.org/index.php/AAAI/article/view/28559. Acesso em: 28 may. 2026.