Wu, Di, Liting Jiang, Ruiyu Fang, Bianjing, Hongyan Xie, Haoxiang Su, Hao Huang, Zhongjiang He, Shuangyong Song, and Xuelong Li. 2026. “Introducing Visual Scenes and Reasoning: A More Realistic Benchmark for Spoken Language Understanding”. Proceedings of the AAAI Conference on Artificial Intelligence 40 (40):33899-907. https://doi.org/10.1609/aaai.v40i40.40682.