1.
Zhuang J, Lu L, Dai M, Hu R, Chen J, Liu Q, et al. ST3: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming. AAAI [Internet]. 2025 Apr. 11 [cited 2026 May 11];39(10):11049-57. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/33201