Zhuang, J., Lu, L., Dai, M., Hu, R., Chen, J., Liu, Q., & Hu, H. (2025). ST3: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming. Proceedings of the AAAI Conference on Artificial Intelligence, 39(10), 11049–11057. https://doi.org/10.1609/aaai.v39i10.33201