Pham, T. (2025) “Truth Behind the Scene: Designing Evaluations Benchmarks to Assess LLMs’ Task-Specific Understanding over Test-Taking Strategies”, Proceedings of the AAAI Conference on Artificial Intelligence, 39(28), pp. 29596-29598. doi: 10.1609/aaai.v39i28.35337.