Tian, Hao, Sheng Lu, Fuwen Tian, Guangming Cui, Zheng Li, Xuyun Zhang, Quan Z. Sheng, and Wanchun Dou. 2026. “DIAA: A Decoding-Efficient Inference Acceleration Approach for On-Device Large Language Models”. Proceedings of the AAAI Conference on Artificial Intelligence 40 (31):25896-904. https://doi.org/10.1609/aaai.v40i31.39789.