1.
Pham T. Truth Behind the Scene: Designing Evaluations Benchmarks to Assess LLMs’ Task-Specific Understanding over Test-Taking Strategies. AAAI [Internet]. 2025Apr.11 [cited 2026Apr.25];39(28):29596-8. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/35337