Zhang, W., Shi, H., Tang, S., Xiao, J., Yu, Q., & Zhuang, Y. (2021). Consensus Graph Representation Learning for Better Grounded Image Captioning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(4), 3394-3402. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/16452