Ji, K., Guo, Y., Zhang, Z., Zhu, X., Tian, Y., & Liu, N. (2026). MedOmni-45°: A Safety–Performance Benchmark for Reasoning-Oriented LLMs in Medicine. Proceedings of the AAAI Conference on Artificial Intelligence, 40(42), 35536–35544. https://doi.org/10.1609/aaai.v40i42.40864