Hou, Jingyi, Xinxiao Wu, Xiaoxun Zhang, Yayun Qi, Yunde Jia, and Jiebo Luo. “Joint Commonsense and Relation Reasoning for Image and Video Captioning”. Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 10973-10980. Accessed April 17, 2024. https://ojs.aaai.org/index.php/AAAI/article/view/6731.