Hu, X., X. Yin, K. Lin, L. Zhang, J. Gao, L. Wang, and Z. Liu. “VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 2, May 2021, pp. 1575-83, doi:10.1609/aaai.v35i2.16249.