POLAR: A Per-User Association Test in Embedding Space

Authors

  • Pedro Bento Universidade Federal de Minas Gerais
  • Arthur Buzelin Universidade Federal de Minas Gerais
  • Arthur Chagas Universidade Federal de Minas Gerais
  • Yan Aquino Universidade Federal de Minas Gerais - UFMG
  • Victoria Estanislau UFMG
  • Samira Malaquias UFMG
  • Pedro Robles Dutenhefner UFMG
  • Gisele L. Pappa Universidade Federal de Minas Gerais
  • Virgilio Almeida Universidade Federal de Minas Gerais
  • Wagner Meira Jr. Universidade Federal de Minas Gerais

DOI:

https://doi.org/10.1609/icwsm.v20i1.42636

Abstract

Most intrinsic association probes operate at the word, sentence, or corpus level, obscuring author-level variation. We present COMPASS (COsine-Based Measure of Per-User Association with Semantic Subspaces), a per-user lexical association test that runs in the embedding space of a lightly adapted masked language model. Authors are represented by private deterministic tokens; COMPASS projects these vectors onto curated lexical axes and reports standardized effects with permutation p-values and Benjamini--Hochberg control. On a balanced bot--human Twitter benchmark, COMPASS cleanly separates LLM-driven bots from organic accounts; on an extremist forum, it quantifies strong alignment with slur lexicons and reveals rightward drift over time. The method is modular to new attribute sets and provides concise, per-author diagnostics for computational social science.

Downloads

Published

2026-05-25

How to Cite

Bento, P., Buzelin, A., Chagas, A., Aquino, Y., Estanislau, V., Malaquias, S., … Meira Jr., W. (2026). POLAR: A Per-User Association Test in Embedding Space. Proceedings of the International AAAI Conference on Web and Social Media, 20(1), 250–261. https://doi.org/10.1609/icwsm.v20i1.42636