RoIt-XMASA: Multi-Domain Multilingual Sentiment Analysis Dataset for Romanian and Italian

Authors

  • Andrei-Marius Avram National University of Science and Technology Politehnica Bucharest
  • Aureliu-Valentin Antonie National University of Science and Technology POLITEHNICA Bucharest
  • Cosmin-Mircea Croitoriu National University of Science and Technology Politehnica Bucharest
  • Vlad-Andrei Muntean National University of Science and Technology POLITEHNICA Bucharest
  • Dumitru-Clementin Cercel National University of Science and Technology POLITEHNICA Bucharest

DOI:

https://doi.org/10.1609/icwsm.v20i1.42777

Abstract

We present RoIt-XMASA, a multilingual dataset that extends the Cross-lingual Multi-domain Amazon Sentiment Analysis to Italian and Romanian, comprising 36,000 labeled reviews across three domains (books, movies, and music) and 202,141 unlabeled samples. To address cross-lingual and cross-domain challenges, we propose a multi-target adversarial training framework that employs loss reversal with meta-learned coefficients to dynamically balance sentiment discrimination with domain and language invariance. With our approach, XLM-R achieves an F1-score of 66.23%, outperforming the baseline by 4.64%. Few-shot evaluation shows that Llama-3.1-8B achieves 58.43% F1-score, revealing a meaningful trade-off between the efficiency of prompting-based approaches and the higher performance of task-specific fine-tuning.

Downloads

Published

2026-05-25

How to Cite

Avram, A.-M., Antonie, A.-V., Croitoriu, C.-M., Muntean, V.-A., & Cercel, D.-C. (2026). RoIt-XMASA: Multi-Domain Multilingual Sentiment Analysis Dataset for Romanian and Italian. Proceedings of the International AAAI Conference on Web and Social Media, 20(1), 2723–2734. https://doi.org/10.1609/icwsm.v20i1.42777