Zhu, Zhihong, et al. “S³-MSD: Large Vision-Language Model for Explainable and Generalizable Multi-Modal Sarcasm Detection”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 41, Mar. 2026, pp. 35266-74, doi:10.1609/aaai.v40i41.40834.