[1]

Z. Wang, “ConfGuard: A Simple and Effective Backdoor Detection for Large Language Models”, AAAI, vol. 40, no. 42, pp. 35829–35837, Mar. 2026.