Identifying Effective Signals to Predict Deleted and Suspended Accounts on Twitter Across Languages
Social networks have an ephemerality to them where accounts and messages are constantly being edited, deleted, or marked as private. This continuous change comes from concerns around privacy, a potential desire for to be forgotten and suspicious behavior. In this study we present a novel task – predicting suspicious e.g., to be deleted or suspended accounts in social media. We analyze multiple datasets of thousands of active, deleted and suspended Twitter accounts to produce a series of predictive representations for the removal or shutdown of an account. We selected these accounts from speakers of three languages – Russian, Spanish, and English to evaluate if speakers of various languages behave differently with regards to deleting accounts. We compared the predictive power of the state-of-the-art machine learning models to recurrent neutral networks trained on previously unexplored features. Furthermore, this work is the first to rely on image and affect signals in addition to language and network to predict deleted and suspended accounts in social media. We found that unlike widely used profile and network features, the discourse of deleted or suspended versus active accounts forms the basis for highly accurate account deletion and suspension prediction. More precisely, we observed that the presence of certain terms in tweets leads to a higher likelihood for that user’s account deletion or suspension. Moreover, despite image and affect signals yield lower predictive performance compared to language, they reveal interesting behavioral differences across speakers of different languages. Our extensive analysis and novel findings on language use and suspicious behavior of speakers of different languages can improve the existing approaches to credibility analysis, disinformation and deception detection in social media.