Mind Your Language: Abuse and Offense Detection for Code-Switched Languages

Authors

  • Raghav Kapoor Netaji Subhas Institute of Technology
  • Yaman Kumar Indian Institute of Technology Delhi
  • Kshitij Rajput Netaji Subhas Institute of Technology
  • Rajiv Ratn Shah Indian Institute of Technology Delhi
  • Ponnurangam Kumaraguru Indian Institute of Technology Delhi
  • Roger Zimmermann National University of Singaport

DOI:

https://doi.org/10.1609/aaai.v33i01.33019951

Abstract

In multilingual societies like the Indian subcontinent, use of code-switched languages is much popular and convenient for the users. In this paper, we study offense and abuse detection in the code-switched pair of Hindi and English (i.e, Hinglish), the pair that is the most spoken. The task is made difficult due to non-fixed grammar, vocabulary, semantics and spellings of Hinglish language. We apply transfer learning and make a LSTM based model for hate speech classification. This model surpasses the performance shown by the current best models to establish itself as the state-of-the-art in the unexplored domain of Hinglish offensive text classification. We also release our model and the embeddings trained for research purposes.

Downloads

Published

2019-07-17

How to Cite

Kapoor, R., Kumar, Y., Rajput, K., Shah, R. R., Kumaraguru, P., & Zimmermann, R. (2019). Mind Your Language: Abuse and Offense Detection for Code-Switched Languages. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 9951-9952. https://doi.org/10.1609/aaai.v33i01.33019951

Issue

Section

Student Abstract Track