Han, S., Fan, H., Fu, J., Li, L., Li, T., Cui, J., … Li, C. (2026). EvalMuse-40K: A Fine-Grained Benchmark with Comprehensive Human Annotations for Text-to-Image Generation Model Alignment Evaluation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(6), 4583–4591. https://doi.org/10.1609/aaai.v40i6.42458