[1]
H. Zhang, P. Hu, and W. E. Zhang, “LLaVA-MS-PIT: Multi-Modal Schema-Guided Progressive Instruction Tuning for Multi-Modal Event Extraction”, AAAI, vol. 40, no. 41, pp. 34692–34700, Mar. 2026.