Zhuang, J. (2025) “ST3: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming”, Proceedings of the AAAI Conference on Artificial Intelligence, 39(10), pp. 11049–11057. doi: 10.1609/aaai.v39i10.33201.