[1]

Z. Zhu, “S³-MSD: Large Vision-Language Model for Explainable and Generalizable Multi-modal Sarcasm Detection”, AAAI, vol. 40, no. 41, pp. 35266–35274, Mar. 2026.