Using Group Membership Markers for Group Identification

Authors

  • Jean Gawron San Diego State University
  • Dipak Gupta San Diego State University
  • Kellen Stephens San Diego State University
  • Ming-Hsiang Tsou San Diego State University
  • Brian Spitzberg San Diego State University
  • Li An San Diego State University

DOI:

https://doi.org/10.1609/icwsm.v6i1.14336

Keywords:

white militant weblog classifier ranker

Abstract

We describe a system for automatically ranking documents by degree of militancy, designed as a tool both for finding militant websites and prioritizing the data found. We compare three ranking systems, one employing a small hand-selected vocabulary based on group membership markers used by insiders to identify members and member properties (us) and outsiders and threats (them), one with a much larger vocabulary, and another with a small vocabulary chosen by Mutual Information. We use the same vocabularies to build classifiers. The ranker that achieves the best correlations with human judgments uses the small us-them vocabulary. We confirm and extend recent results in sentiment analysis (paltoglou 2010), showing that a feature-weighting scheme taken from classical IR (TFIDF) produces the best ranking system; we also find, surprisingly, that adjusting these weights with SVM training, while producing a better classifier, produces a worse ranker. Increasing vocabulary size similarly improves classification (while worsening ranking).

Downloads

Published

2021-08-03

How to Cite

Gawron, J., Gupta, D., Stephens, K., Tsou, M.-H., Spitzberg, B., & An, L. (2021). Using Group Membership Markers for Group Identification. Proceedings of the International AAAI Conference on Web and Social Media, 6(1), 467-470. https://doi.org/10.1609/icwsm.v6i1.14336