[1]
Y. Liu, D. Peng, W. Wei, Y. Fu, W. Xie, and D. Chen, “Detection-Based Intermediate Supervision for Visual Question Answering”, AAAI, vol. 38, no. 12, pp. 14061-14068, Mar. 2024.