(1)

Zhang, W.; Shi, H.; Tang, S.; Xiao, J.; Yu, Q.; Zhuang, Y. Consensus Graph Representation Learning for Better Grounded Image Captioning. AAAI 2021, 35, 3394-3402.