[1]

Ji, K. et al. 2026. MedOmni-45°: A Safety–Performance Benchmark for Reasoning-Oriented LLMs in Medicine. Proceedings of the AAAI Conference on Artificial Intelligence. 40, 42 (Mar. 2026), 35536–35544. DOI:https://doi.org/10.1609/aaai.v40i42.40864.