Zhang, H., Hu, P., & Zhang, W. E. (2026). LLaVA-MS-PIT: Multi-Modal Schema-Guided Progressive Instruction Tuning for Multi-Modal Event Extraction. Proceedings of the AAAI Conference on Artificial Intelligence, 40(41), 34692–34700. https://doi.org/10.1609/aaai.v40i41.40770