[1]

W. Zhang, H. Shi, S. Tang, J. Xiao, Q. Yu, and Y. Zhuang, “Consensus Graph Representation Learning for Better Grounded Image Captioning”, AAAI, vol. 35, no. 4, pp. 3394-3402, May 2021.