Xu, Zihao, Junchen Ding, Yiling Lou, Kun Zhang, Dong Gong, and Yuekang Li. “Socrates or Smartypants: Testing Logic Reasoning Capabilities of Large Language Models With Logic Programming-Based Test Oracles”. Proceedings of the AAAI Conference on Artificial Intelligence 40, no. 23 (March 14, 2026): 19433–19440. Accessed July 12, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/39021.