[1]

J. Zhuang, “Q Cache: Visual Attention Is Valuable in Less than Half of Decode Layers for Multimodal Large Language Model”, AAAI, vol. 40, no. 16, pp. 14031–14039, Mar. 2026.