Sadiekh, S., Ericheva, E. and Agarwal, C. (2026) “Polarity-Aware Probing for Quantifying Latent Alignment in Language Models”, Proceedings of the AAAI Conference on Artificial Intelligence, 40(44), pp. 37896–37903. doi: 10.1609/aaai.v40i44.41126.