Tian, H., Lu, S., Tian, F., Cui, G., Li, Z., Zhang, X., … Dou, W. (2026). DIAA: A Decoding-Efficient Inference Acceleration Approach for On-Device Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(31), 25896–25904. https://doi.org/10.1609/aaai.v40i31.39789