CROSSNEWS: A Cross-Genre Authorship Verification and Attribution Benchmark

Authors

  • Marcus Ma Georgia Institute of Technology
  • Duong Minh Le Georgia Institute of Technology
  • Junmo Kang Georgia Institute of Technology
  • Yao Dou Georgia Institute of Technology
  • John Cadigan SRI International
  • Dayne Freitag SRI International
  • Alan Ritter Georgia Institute of Technology
  • Wei Xu Georgia Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v39i23.34659

Abstract

Authorship models have historically generalized poorly to new domains because of the wide distribution of author-identifying signals across domains. In particular, the effects of topic and genre are highly domain-dependent and impact authorship analysis performance greatly. This paper addresses the existing data gap in authorship for these resources by introducing CROSSNEWS, a novel cross-genre dataset that connects formal journalistic articles and casual social media posts. CROSSNEWS is the largest authorship dataset of its kind for supporting both verification and attribution tasks, with comprehensive topic and genre annotations. We use CROSSNEWS to demonstrate that current models exhibit poor performance in genre transfer scenarios, underscoring the need for authorship models robust to genre-specific effects. We also explore SELMA, a new LLM embedding approach for large-scale authorship setups that outperforms existing models in both same-genre and cross-genre settings.

Downloads

Published

2025-04-11

How to Cite

Ma, M., Le, D. M., Kang, J., Dou, Y., Cadigan, J., Freitag, D., … Xu, W. (2025). CROSSNEWS: A Cross-Genre Authorship Verification and Attribution Benchmark. Proceedings of the AAAI Conference on Artificial Intelligence, 39(23), 24777–24785. https://doi.org/10.1609/aaai.v39i23.34659

Issue

Section

AAAI Technical Track on Natural Language Processing II