(1)

Zhuang, J.; Lu, L.; Dai, M.; Hu, R.; Chen, J.; Liu, Q.; Hu, H. Q Cache: Visual Attention Is Valuable in Less Than Half of Decode Layers for Multimodal Large Language Model. AAAI 2026, 40, 14031-14039.