Liu, Y., Peng, D., Wei, W., Fu, Y., Xie, W., & Chen, D. (2024). Detection-Based Intermediate Supervision for Visual Question Answering. Proceedings of the AAAI Conference on Artificial Intelligence, 38(12), 14061-14068. https://doi.org/10.1609/aaai.v38i12.29315