[1]

Cheng, Z. et al. 2025. CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models. Proceedings of the AAAI Conference on Artificial Intelligence. 39, 22 (Apr. 2025), 23678–23686. DOI:https://doi.org/10.1609/aaai.v39i22.34538.