[1]
P. Shamsolmoali, M. Zareapoor, E. Granger, and M. Felsberg, “SeTformer Is What You Need for Vision and Language”, AAAI, vol. 38, no. 5, pp. 4713–4721, Mar. 2024.