Liu, C. (2026) “TTF-VLA: Temporal Token Fusion via Pixel-Attention Integration for Vision-Language-Action Models”, Proceedings of the AAAI Conference on Artificial Intelligence, 40(22), pp. 18452–18459. doi: 10.1609/aaai.v40i22.38910.