Zhuang, Jiedong, et al. “Q Cache: Visual Attention Is Valuable in Less Than Half of Decode Layers for Multimodal Large Language Model”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 16, Mar. 2026, pp. 14031-9, doi:10.1609/aaai.v40i16.38414.