Hu, X., Yin, X., Lin, K., Zhang, L., Gao, J., Wang, L., & Liu, Z. (2021). VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(2), 1575-1583. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/16249