1.
Wu D, Jiang L, Fang R, B, Xie H, Su H, et al. Introducing Visual Scenes and Reasoning: A More Realistic Benchmark for Spoken Language Understanding. AAAI [Internet]. 2026 Mar. 14 [cited 2026 May 14];40(40):33899-907. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/40682