[1]
L. Chen, “Human-Centric Video Generation via Collaborative Multi-Modal Conditioning”, AAAI, vol. 40, no. 4, pp. 2939–2947, Mar. 2026.