Zhuang, J., Lu, L., Dai, M., Hu, R., Chen, J., Liu, Q., & Hu, H. (2026). Q Cache: Visual Attention Is Valuable in Less than Half of Decode Layers for Multimodal Large Language Model. Proceedings of the AAAI Conference on Artificial Intelligence, 40(16), 14031–14039. https://doi.org/10.1609/aaai.v40i16.38414