Wang, Zihan, et al. “ConfGuard: A Simple and Effective Backdoor Detection for Large Language Models”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 42, Mar. 2026, pp. 35829-37, doi:10.1609/aaai.v40i42.40897.