Xu Z, Ding J, Lou Y, Zhang K, Gong D, Li Y. Socrates or Smartypants: Testing Logic Reasoning Capabilities of Large Language Models with Logic Programming-Based Test Oracles. AAAI [Internet]. 2026 Mar. 14 [cited 2026 Jul. 12];40(23):19433-40. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/39021