[1]

J. Zhao, Y. Huang, and F. Lu, “Learning Procedural-Aware Video Representations Through State-Grounded Hierarchy Unfolding”, AAAI, vol. 40, no. 16, pp. 13172-13180, Mar. 2026.