Shamsolmoali, Pourya, Masoumeh Zareapoor, Eric Granger, and Michael Felsberg. 2024. “SeTformer Is What You Need for Vision and Language”. Proceedings of the AAAI Conference on Artificial Intelligence 38 (5):4713-21. https://doi.org/10.1609/aaai.v38i5.28272.