Zhang, W., Shi, H., Tang, S., Xiao, J., Yu, Q., & Zhuang, Y. (2021). Consensus Graph Representation Learning for Better Grounded Image Captioning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(4), 3394-3402. https://doi.org/10.1609/aaai.v35i4.16452