Zhang, Taolin, et al. “Vision-Language Pre-Training With Object Contrastive Learning for 3D Scene Understanding”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 7, Mar. 2024, pp. 7296-04, doi:10.1609/aaai.v38i7.28559.