(1)
Liu, C.; Zhang, J.; Li, C.; Zhou, Z.; Wu, S.; Huang, S.; Duan, H. TTF-VLA: Temporal Token Fusion via Pixel-Attention Integration for Vision-Language-Action Models. AAAI 2026, 40, 18452-18459.