Liu, Yuhang, et al. “Detection-Based Intermediate Supervision for Visual Question Answering”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 12, Mar. 2024, pp. 14061-8, doi:10.1609/aaai.v38i12.29315.