Xu, Z., Ding, J., Lou, Y., Zhang, K., Gong, D., & Li, Y. (2026). Socrates or Smartypants: Testing Logic Reasoning Capabilities of Large Language Models with Logic Programming-Based Test Oracles. Proceedings of the AAAI Conference on Artificial Intelligence, 40(23), 19433–19440. https://doi.org/10.1609/aaai.v40i23.39021