[1]

Zhuang, J. et al. 2025. ST3: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming. Proceedings of the AAAI Conference on Artificial Intelligence. 39, 10 (Apr. 2025), 11049–11057. DOI:https://doi.org/10.1609/aaai.v39i10.33201.