Hu, X., Yin, X., Lin, K., Zhang, L., Gao, J., Wang, L., & Liu, Z. (2021). VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(2), 1575-1583. https://doi.org/10.1609/aaai.v35i2.16249