Han, S. (2026) “EvalMuse-40K: A Fine-Grained Benchmark with Comprehensive Human Annotations for Text-to-Image Generation Model Alignment Evaluation”, Proceedings of the AAAI Conference on Artificial Intelligence, 40(6), pp. 4583–4591. doi: 10.1609/aaai.v40i6.42458.