TIAN, Hao; LU, Sheng; TIAN, Fuwen; CUI, Guangming; LI, Zheng; ZHANG, Xuyun; SHENG, Quan Z.; DOU, Wanchun. DIAA: A Decoding-Efficient Inference Acceleration Approach for On-Device Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, [S. l.], v. 40, n. 31, p. 25896–25904, 2026. DOI: 10.1609/aaai.v40i31.39789. Disponível em: https://ojs.aaai.org/index.php/AAAI/article/view/39789. Acesso em: 14 may. 2026.