[1]
J. Hou, X. Wu, X. Zhang, Y. Qi, Y. Jia, and J. Luo, “Joint Commonsense and Relation Reasoning for Image and Video Captioning”, AAAI, vol. 34, no. 07, pp. 10973-10980, Apr. 2020.