Proceedings of the International AAAI Conference on Web and Social Media https://ojs.aaai.org/index.php/ICWSM <p>The proceedings of the International AAAI Conference on Web and Social Media (ICWSM) provides an archival record of the ICWSM conference — a forum where researchers from multiple disciplines to come together to share knowledge, discuss ideas, exchange information, and learn about cutting-edge research in diverse fields with the common theme of online social media. This overall theme includes research in new perspectives in social theories, as well as computational algorithms for analyzing social media. ICWSM is a singularly fitting venue for research that blends social science and computational approaches to answer important and challenging questions about human social behavior through social media while advancing computational tools for vast and unstructured data.</p> en-US Mon, 05 Jun 2023 00:00:00 -0700 OJS 3.2.1.1 http://blogs.law.harvard.edu/tech/rss 60 Erratum to: Rules and Rule-Making in the Five Largest Wikipedias https://ojs.aaai.org/index.php/ICWSM/article/view/27319 <div class="c-article-header"> <div class="u-mb-8 c-status-message c-status-message--boxed c-status-message--info"> <p class="u-mt-0"><em>The <a class="relation-link" href="https://doi.org/10.1609/icwsm.v16i1.19297" data-track="click" data-track-action="view linked article" data-track-label="link">Original Article</a> was published on 31 May 2023.</em></p> </div> </div> Sohyeon Hwang, Aaron Shaw Copyright (c) 2023 Proceedings of the International AAAI Conference on Web and Social Media https://ojs.aaai.org/index.php/ICWSM/article/view/27319 Mon, 10 Jul 2023 00:00:00 -0700 How Do US Congress Members Advertise Climate Change: An Analysis of Ads Run on Meta’s Platforms https://ojs.aaai.org/index.php/ICWSM/article/view/22121 Ensuring transparency and integrity in political communication on climate change has arguably never been more important than today. Yet we know little about how politicians focus on, talk about, and portray climate change on social media. Here we study it from the perspective of political advertisement. We use Meta’s Ad Library to collect 602,546 ads that have been issued by US Congress members since mid-2018. Out of those only 19,176 (3.2%) are climate-related. Analyzing this data, we find that Democrats focus substantially more on climate change than Republicans, with 99.7% of all climate-related ads stemming from Democratic politicians. In particular, we find this is driven by a small core of Democratic politicians, where 72% of all impressions can be attributed to 10 politicians. Interestingly, we find a significant difference in the average amount of impressions generated per dollar spent between the two parties. Republicans generate on average 188% more impressions with their climate ads for the same money spent as Democrats. We build models to explain the differences and find that demographic factors only partially explain the variance. Our results demonstrate differences of climate-related advertisements of US congress members and reveal differences in advertising characteristics between the two political parties. We anticipate our work to be a starting point for further studies about climate-related ads on Meta’s platforms. Laurenz Aisenpreis, Gustav Gyrst, Vedran Sekara Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22121 Fri, 02 Jun 2023 00:00:00 -0700 The Pursuit of Peer Support for Opioid Use Recovery on Reddit https://ojs.aaai.org/index.php/ICWSM/article/view/22122 Individuals suffering from Opioid Use Disorder and other socially stigmatized conditions often rely on peer support groups to find comfort and motivation while treating their condition. Many may face barriers in accessing peer support treatment, such as shame and social stigma, seclusion, or mobility restrictions. In this study, we quantitatively characterize the potential of the Reddit community in offering these individuals an online alternative to receiving peer support. By analyzing the social interactions of thousands of users during the start of opioid use recovery, we uncover that a particular Reddit community exhibits many characteristics similar to in-person peer support groups, featuring the exchange of support, trust, status, and similar experiences. We find that the supportive behavior of this community nudges users to change their personal behavior, and promotes abandoning opioid-related communities in favor of recovery-oriented relationships. Finally, we find that recognition, acknowledgment, and knowledge exchange are the most relevant factors in sustained engagement with the recovery community. Given this evidence, we suggest that this online community may constitute a complement or a surrogate to peer support groups when in-person meetings are not desirable or possible. Our work might inspire harm reduction policies and interventions to favor successful rehabilitation and is fundamental for future research about the use of digital media for recovery support. Duilio Balsamo, Paolo Bajardi, Gianmarco De Francisci Morales, Corrado Monti, Rossano Schifanella Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22122 Fri, 02 Jun 2023 00:00:00 -0700 Exposure to Marginally Abusive Content on Twitter https://ojs.aaai.org/index.php/ICWSM/article/view/22123 Social media platforms can help people find connection and entertainment, but they can also show potentially abusive content such as insults and targeted cursing. While platforms do remove some abusive content for rule violation, some is considered "margin content" that does not violate any rules and thus stays on the platform. This paper presents a focused analysis of exposure to such content on Twitter, asking (RQ1) how exposure to marginally abusive content varies across Twitter users, and (RQ2) how algorithmically-ranked timelines impact exposure to marginally abusive content. Based on one month of impression data from November 2021, descriptive analyses (RQ1) show significant variation in exposure, with more active users experiencing higher rates and higher volumes of marginal impressions. Experimental analyses (RQ2) show that users with algorithmically-ranked timelines experience slightly lower rates of marginal impressions. However, they tend to register more total impression activity and thus experience a higher cumulative volume of marginal impressions. The paper concludes by discussing implications of the observed concentration, the multifaceted impact of algorithmically-ranked timelines, and potential directions for future work. Jack Bandy, Tomo Lazovich Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22123 Fri, 02 Jun 2023 00:00:00 -0700 Finding Qs: Profiling QAnon Supporters on Parler https://ojs.aaai.org/index.php/ICWSM/article/view/22124 The social media platform "Parler'' has emerged into a prominent fringe community where a significant part of the user base are self-reported supporters of QAnon, a far-right conspiracy theory alleging that a cabal of elites controls global politics. QAnon is considered to have had an influential role in the public discourse during the 2020 U.S. presidential election. However, little is known about QAnon supporters on Parler and what sets them aside from other users. Building up on social identity theory, we aim to profile the characteristics of QAnon supporters on Parler. We analyze a large-scale dataset with more than 600,000 profiles of English-speaking users on Parler. Based on users' profiles, posts, and comments, we then extract a comprehensive set of user features, linguistic features, network features, and content features. This allows us to perform user profiling and understand to what extent these features discriminate between QAnon and non-QAnon supporters on Parler. Our analysis is three-fold: (1) We quantify the number of QAnon supporters on Parler, finding that 34,913 users (5.5% of all users) openly report supporting the conspiracy. (2) We examine differences between QAnon vs. non-QAnon supporters. We find that QAnon supporters differ statistically significantly from non-QAnon supporters across multiple dimensions. For example, they have, on average, a larger number of followers, followees, and posts, and thus have a large impact on the Parler network. (3) We use machine learning to identify which user characteristics discriminate QAnon from non-QAnon supporters. We find that user features, linguistic features, network features, and content features, can - to a large extent - discriminate QAnon vs. non-QAnon supporters on Parler. In particular, we find that user features are highly discriminatory, followed by content features and linguistic features. Dominik Bär, Nicolas Pröllochs, Stefan Feuerriegel Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22124 Fri, 02 Jun 2023 00:00:00 -0700 Predicting Future Location Categories of Users in a Large Social Platform https://ojs.aaai.org/index.php/ICWSM/article/view/22125 Understanding the users' patterns of visiting various location categories can help online platforms improve content personalization and user experiences. Current literature on predicting future location categories of a user typically employs features that can be traced back to the user, such as spatial geo-coordinates and demographic identities. Moreover, existing approaches commonly suffer from cold-start and generalization problems, and often cannot specify when the user will visit the predicted location category. In a large social platform, it is desirable for prediction models to avoid using user-identifiable data, generalize to unseen and new users, and be able to make predictions for specific times in the future. In this work, we construct a neural model, LocHabits, using data from Snapchat. The model omits user-identifiable inputs, leverages temporal and sequential regularities in the location category histories of Snapchat users and their friends, and predicts the users' next-hour location categories. We evaluate our model on several real-life, large-scale datasets from Snapchat and FourSquare, and find that the model can outperform baselines by 14.94% accuracy. We confirm that the model can (1) generalize to unseen users from different areas and times, and (2) fall back on collective trends in the cold-start scenario. We also study the relative contributions of various factors in making the predictions and find that the users' visitation preferences and most-recent visitation sequences play more important roles than time contexts, same-hour sequences, and social influence features. Raiyan Abdul Baten, Yozen Liu, Heinrich Peters, Francesco Barbieri, Neil Shah, Leonardo Neves, Maarten W. Bos Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22125 Fri, 02 Jun 2023 00:00:00 -0700 Followback Clusters, Satellite Audiences, and Bridge Nodes: Coengagement Networks for the 2020 US Election https://ojs.aaai.org/index.php/ICWSM/article/view/22126 The 2020 United States (US) presidential election was — and has continued to be — the focus of pervasive and persistent mis- and disinformation spreading through our media ecosystems, including social media. This event has driven the collection and analysis of large, directed social network datasets, but such datasets can resist intuitive understanding. In such large datasets, the overwhelming number of nodes and edges present in typical representations create visual artifacts, such as densely overlapping edges and tightly-packed formations of low-degree nodes, which obscure many features of more practical interest. We apply a method, coengagement transformations, to convert such networks of social data into tractable images. Intuitively, this approach allows for parameterized network visualizations that make shared audiences of engaged viewers salient to viewers. Using the interpretative capabilities of this method, we perform an extensive case study of the 2020 United States presidential election on Twitter, contributing an empirical analysis of coengagement. By creating and contrasting different networks at different parameter sets, we define and characterize several structures in this discourse network, including bridging accounts, satellite audiences, and followback communities. We discuss the importance and implications of these empirical network features in this context. In addition, we release open-source code for creating coengagement networks from Twitter and other structured interaction data. Andrew Beers, Joseph S. Schafer, Ian Kennedy, Morgan Wack, Emma S. Spiro, Kate Starbird Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22126 Fri, 02 Jun 2023 00:00:00 -0700 Measuring the Ideology of Audiences for Web Links and Domains Using Differentially Private Engagement Data https://ojs.aaai.org/index.php/ICWSM/article/view/22127 This paper demonstrates the use of differentially private hyperlink-level engagement data for measuring ideologies of audiences for web domains, individual links, or aggregations thereof. We examine a simple metric for measuring this ideological position and assess the conditions under which the metric is robust to injected, privacy-preserving noise. This assessment provides insights into and constraints on the level of activity one should observe when applying this metric to privacy-protected data. Grounding this work is a massive dataset of social media engagement activity where privacy-preserving noise has been injected into the activity data, provided by Facebook and the Social Science One (SS1) consortium. Using this dataset, we validate our ideology measures by comparing to similar, published work on sharing-based, homophily- and content-oriented measures, where we show consistently high correlation (>0.87). We then apply this metric to individual links from several popular news domains and demonstrate how one can assess link-level distributions of ideological audiences. We further show this estimator is robust to selection of engagement types besides sharing, where domain-level audience-ideology assessments based on views and likes show no significant difference compared to sharing-based estimates. Estimates of partisanship, however, suggest the viewing audience is more moderate than the audiences who share and like these domains. Beyond providing thresholds on sufficient activity for measuring audience ideology and comparing three types of engagement, this analysis provides a blueprint for ensuring robustness of future work to differential privacy protections. Cody Buntain, Richard Bonneau, Jonathan Nagler, Joshua A. Tucker Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22127 Fri, 02 Jun 2023 00:00:00 -0700 RTANet: Recommendation Target-Aware Network Embedding https://ojs.aaai.org/index.php/ICWSM/article/view/22128 Network embedding is a process of encoding nodes into latent vectors by preserving network structure and content information. It is used in various applications, especially in recommender systems. In a social network setting, when recommending new friends to a user, the similarity between the user's embedding and the target friend will be examined. Traditional methods generate user node embedding without considering the recommendation target. No matter which target is to be recommended, the same embedding vector is generated for that particular user. This approach has its limitations. For example, a user can be both a computer scientist and a musician. When recommending music friends with potentially the same taste to him, we are interested in getting his representation that is useful in recommending music friends rather than computer scientists. His corresponding embedding should consider the user's musical features rather than those associated with computer science with the awareness that the recommendation targets are music friends. In order to address this issue, we propose a new framework which we name it as Recommendation Target-Aware Network embedding method (RTANet). Herein, the embedding of each user is no longer fixed to a constant vector, but it can vary according to their specific recommendation target. Concretely, RTANet assigns different attention weights to each neighbour node, allowing us to obtain the user's context information aggregated from its neighbours before transforming this context into its embedding. Different from other graph attention approaches, the attention weights in our work measure the similarity between each user's neighbour node and the target node, which in return generates the target-aware embedding. To demonstrate the effectiveness of our method, we compared RTANet with several state-of-the-art network embedding methods on four real-world datasets and showed that RTANet outperforms other comparative methods in the recommendation tasks. Qimeng Cao, Qing Yin, Yunya Song, Zhihua Wang, Yujun Chen, Richard Yi Da Xu, Xian Yang Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22128 Fri, 02 Jun 2023 00:00:00 -0700 Recipe Networks and the Principles of Healthy Food on the Web https://ojs.aaai.org/index.php/ICWSM/article/view/22129 People increasingly use the Internet to make food-related choices, prompting research on food recommendation systems. Recently, works that incorporate nutritional constraints into the recommendation process have been proposed to promote healthier recipes. Ingredient substitution is also used, particularly by people motivated to reduce the intake of a specific nutrient or in order to avoid a particular category of ingredients due for instance to allergies. This study takes a complementary approach towards empowering people to make healthier food choices by simplifying the process of identifying plausible recipe substitutions. To achieve this goal, this work constructs a large-scale network of similar recipes, and analyzes this network to reveal interesting properties that have important implications to the development of food recommendation systems. Charalampos Chelmis, Bedirhan Gergin Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22129 Fri, 02 Jun 2023 00:00:00 -0700 Partisan US News Media Representations of Syrian Refugees https://ojs.aaai.org/index.php/ICWSM/article/view/22130 We investigate how representations of Syrian refugees (2011-2021) differ across US partisan news outlets. We analyze 47,388 articles from the online US media about Syrian refugees to detail differences in reporting between left- and right-leaning media. We use various NLP techniques to understand these differences. Our polarization and question answering results indicated that left-leaning media tended to represent refugees as child victims, welcome in the US, and right-leaning media cast refugees as Islamic terrorists. We noted similar results with our sentiment and offensive speech scores over time, which detail possibly unfavorable representations of refugees in right-leaning media. A strength of our work is how the different techniques we have applied validate each other. Based on our results, we provide several recommendations. Stakeholders may utilize our findings to intervene around refugee representations, and design communications campaigns that improve the way society sees refugees and possibly aid refugee outcomes. Keyu Chen, Marzieh Babaeianjelodar, Yiwen Shi, Kamila Janmohamed, Rupak Sarkar, Ingmar Weber, Thomas Davidson, Munmun De Choudhury, Jonathan Huang, Shweta Yadav, Ashiqur KhudaBukhsh, Chris T Bauch, Preslav Nakov, Orestis Papakyriakopoulos, Koustuv Saha, Kaveh Khoshnood, Navin Kumar Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22130 Fri, 02 Jun 2023 00:00:00 -0700 DiPPS: Differentially Private Propensity Scores for Bias Correction https://ojs.aaai.org/index.php/ICWSM/article/view/22131 In surveys, it is typically up to the individuals to decide if they want to participate or not, which leads to participation bias: the individuals willing to share their data might not be representative of the entire population. Similarly, there are cases where one does not have direct access to any data of the target population and has to resort to publicly available proxy data sampled from a different distribution. In this paper, we present Differentially Private Propensity Scores for Bias Correction (DiPPS), a method for approximating the true data distribution of interest in both of the above settings. We assume that the data analyst has access to a dataset D' that was sampled from the distribution of interest in a biased way. As individuals may be more willing to share their data when given a privacy guarantee, we further assume that the analyst is allowed locally differentially private access to a set of samples D from the true, unbiased distribution. Each data point from the private, unbiased dataset D is mapped to a probability distribution over clusters (learned from the biased dataset D'), from which a single cluster is sampled via the exponential mechanism and shared with the data analyst. This way, the analyst gathers a distribution over clusters, which they use to compute propensity scores for the points in the biased D', which are in turn used to reweight the points in D' to approximate the true data distribution. It is now possible to compute any function on the resulting reweighted dataset without further access to the private D. In experiments on datasets from various domains, we show that DiPPS successfully brings the distribution of the available dataset closer to the distribution of interest in terms of Wasserstein distance. We further show that this results in improved estimates for different statistics, in many cases even outperforming differential privacy mechanisms that are specifically designed for these statistics. Liangwei Chen, Valentin Hartmann, Robert West Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22131 Fri, 02 Jun 2023 00:00:00 -0700 Getting Back on Track: Understanding COVID-19 Impact on Urban Mobility and Segregation with Location Service Data https://ojs.aaai.org/index.php/ICWSM/article/view/22132 Understanding the impact of COVID-19 on urban life rhythms is crucial for accelerating the return-to-normal progress and envisioning more resilient and inclusive cities. While previous studies either depended on small-scale surveys or focused on the response to initial lockdowns, this paper uses large-scale location service data to systematically analyze the urban mobility behavior changes across three distinct phases of the pandemic, i.e., pre-pandemic, lockdown, and reopen. Our analyses reveal two typical patterns that govern the mobility behavior changes in most urban venues: daily life-centered urban venues go through smaller mobility drops during the lockdown and more rapid recovery after reopening, while work-centered urban venues suffer from more significant mobility drops that are likely to persist even after reopening. Such mobility behavior changes exert deeper impacts on the underlying social fabric, where the level of mobility reduction is positively correlated with the experienced segregation at that urban venue. Therefore, urban venues undergoing more mobility reduction are also more filled with people from homogeneous socio-demographic backgrounds. Moreover, mobility behavior changes display significant heterogeneity across geographical regions, which can be largely explained by the partisan inclination at the state level. Our study shows the vast potential of location service data in deriving a timely and comprehensive understanding of the social dynamic in urban space, which is valuable for informing the gradual transition back to the normal lifestyle in a “post-pandemic era”. Lin Chen, Fengli Xu, Qianyue Hao, Pan Hui, Yong Li Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22132 Fri, 02 Jun 2023 00:00:00 -0700 What Are You Anxious About? Examining Subjects of Anxiety during the COVID-19 Pandemic https://ojs.aaai.org/index.php/ICWSM/article/view/22133 COVID-19 poses disproportionate mental health consequences to the public during different phases of the pandemic. We use a computational approach to capture the specific aspects that trigger the public's anxiety about the pandemic and investigate how these aspects change over time. First, we identified nine subjects of anxiety (SOAs) in a sample of Reddit posts (N=86) from r/COVID19\_support using the thematic analysis approach. Then, we quantified Reddit users' anxiety by training algorithms on a manually annotated sample (N=793) to annotate the SOAs in a larger chronological sample (N=6,535). The nine SOAs align with items in various recently developed pandemic anxiety measurement scales. We observed that Reddit users' concerns about health risks remained high in the first eight months since the pandemic started. These concerns diminished dramatically despite the surge of cases occurring later. In general, users' language disclosing the SOAs became less intense as the pandemic progressed. However, worries about mental health and the future steadily increased throughout the period covered in this study. People also tended to use more intense language to describe mental health concerns than health risk or death concerns. Our results suggest that the public's mental health condition does not necessarily improve despite COVID-19 as a health threat gradually weakening due to appropriate countermeasures. Our system lays the groundwork for population health and epidemiology scholars to examine aspects that provoke pandemic anxiety in a timely fashion. Lucia L. Chen, Steven R. Wilson, Sophie Lohmann, Daniela V. Negraia Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22133 Fri, 02 Jun 2023 00:00:00 -0700 Analyzing the Engagement of Social Relationships during Life Event Shocks in Social Media https://ojs.aaai.org/index.php/ICWSM/article/view/22134 Individuals experiencing unexpected distressing events, shocks, often rely on their social network for support. While prior work has shown how social networks respond to shocks, these studies usually treat all ties equally, despite differences in the support provided by different social relationships. Here, we conduct a computational analysis on Twitter that examines how responses to online shocks differ by the relationship type of a user dyad. We introduce a new dataset of over 13K instances of individuals' self-reporting shock events on Twitter and construct networks of relationship-labeled dyadic interactions around these events. By examining behaviors across 110K replies to shocked users in a pseudo-causal analysis, we demonstrate relationship-specific patterns in response levels and topic shifts. We also show that while well-established social dimensions of closeness such as tie strength and structural embeddedness contribute to shock responsiveness, the degree of impact is highly dependent on relationship and shock types. Our findings indicate that social relationships contain highly distinctive characteristics in network interactions, and that relationship-specific behaviors in online shock responses are unique from those of offline settings. Minje Choi, David Jurgens, Daniel M. Romero Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22134 Fri, 02 Jun 2023 00:00:00 -0700 Same Words, Different Meanings: Semantic Polarization in Broadcast Media Language Forecasts Polarity in Online Public Discourse https://ojs.aaai.org/index.php/ICWSM/article/view/22135 With the growth of online news over the past decade, empirical studies on political discourse and news consumption have focused on the phenomenon of filter bubbles and echo chambers. Yet recently, scholars have revealed limited evidence around the impact of such phenomenon, leading some to argue that partisan segregation across news audiences can- not be fully explained by online news consumption alone and that the role of traditional legacy media may be as salient in polarizing public discourse around current events. In this work, we expand the scope of analysis to include both online and more traditional media by investigating the relationship between broadcast news media language and social media discourse. By analyzing a decade’s worth of closed captions (2.1 million speaker turns) from CNN and Fox News along with topically corresponding discourse from Twitter, we pro- vide a novel framework for measuring semantic polarization between America’s two major broadcast networks to demonstrate how semantic polarization between these outlets has evolved (Study 1), peaked (Study 2) and influenced partisan discussions on Twitter (Study 3) across the last decade. Our results demonstrate a sharp increase in polarization in how topically important keywords are discussed between the two channels, especially after 2016, with overall highest peaks occurring in 2020. The two stations discuss identical topics in drastically distinct contexts in 2020, to the extent that there is barely any linguistic overlap in how identical keywords are contextually discussed. Further, we demonstrate at-scale, how such partisan division in broadcast media language significantly shapes semantic polarity trends on Twitter (and vice-versa), empirically linking for the first time, how online discussions are influenced by televised media. We show how the language characterizing opposing media narratives about similar news events on TV can increase levels of partisan dis- course online. To this end, our work has implications for how media polarization on TV plays a significant role in impeding rather than supporting online democratic discourse. Xiaohan Ding, Michael Horning, Eugenia H. Rho Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22135 Fri, 02 Jun 2023 00:00:00 -0700 Catch Me If You Can: Deceiving Stance Detection and Geotagging Models to Protect Privacy of Individuals on Twitter https://ojs.aaai.org/index.php/ICWSM/article/view/22136 The recent advances in natural language processing have yielded many exciting developments in text analysis and language understanding models; however, these models can also be used to track people, bringing severe privacy concerns. In this work, we investigate what individuals can do to avoid being detected by those models while using social media platforms. We ground our investigation in two exposure-risky tasks, stance detection and geotagging. We explore a variety of simple techniques for modifying text, such as inserting typos in salient words, paraphrasing, and adding dummy social media posts. Our experiments show that the performance of BERT-based models fine-tuned for stance detection decreases significantly due to typos, but it is not affected by paraphrasing. Moreover, we find that typos have minimal impact on state-of-the-art geotagging models due to their increased reliance on social networks; however, we show that users can deceive those models by interacting with different users, reducing their performance by almost 50%. Dilara Dogan, Bahadir Altun, Muhammed Said Zengin, Mucahid Kutlu, Tamer Elsayed Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22136 Fri, 02 Jun 2023 00:00:00 -0700 We Are in This Together: Quantifying Community Subjective Wellbeing and Resilience https://ojs.aaai.org/index.php/ICWSM/article/view/22137 The COVID-19 pandemic disrupted everyone's life across the world. In this work, we characterize the subjective wellbeing patterns of 112 cities across the United States during the pandemic prior to vaccine availability, as exhibited in subreddits corresponding to the cities. We quantify subjective wellbeing using positive and negative affect. We then measure the pandemic's impact by comparing a community's observed wellbeing with its expected wellbeing, as forecasted by time series models derived from prior to the pandemic. We show that general community traits reflected in language can be predictive of community resilience. We predict how the pandemic would impact the wellbeing of each community based on linguistic and interaction features from normal times before the pandemic. We find that communities with interaction characteristics corresponding to more closely connected users and higher engagement were less likely to be significantly impacted. Notably, we find that communities that talked more about social ties normally experienced in-person, such as friends, family, and affiliations, were actually more likely to be impacted. Additionally, we use the same features to also predict how quickly each community would recover after the initial onset of the pandemic. We similarly find that communities that talked more about family, affiliations, and identifying as part of a group had a slower recovery. MeiXing Dong, Ruixuan Sun, Laura Biester, Rada Mihalcea Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22137 Fri, 02 Jun 2023 00:00:00 -0700 Non-polar Opposites: Analyzing the Relationship between Echo Chambers and Hostile Intergroup Interactions on Reddit https://ojs.aaai.org/index.php/ICWSM/article/view/22138 Previous research has documented the existence of both online echo chambers and hostile intergroup interactions. In this paper, we explore the relationship between these two phenomena by studying the activity of 5.97M Reddit users and 421M comments posted over 13 years. We examine whether users who are more engaged in echo chambers are more hostile when they comment on other communities. We then create a typology of relationships between political communities based on whether their users are toxic to each other, whether echo chamber-like engagement with these communities has a polarizing effect, and on the communities' political leanings. We observe both the echo chamber and hostile intergroup interaction phenomena, but neither holds universally across communities. Contrary to popular belief, we find that polarizing and toxic speech is more dominant between communities on the same, rather than opposing, sides of the political spectrum, especially on the left; however, this mostly points to the collective targeting of political outgroups. Alexandros Efstratiou, Jeremy Blackburn, Tristan Caulfield, Gianluca Stringhini, Savvas Zannettou, Emiliano De Cristofaro Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22138 Fri, 02 Jun 2023 00:00:00 -0700 Misleading Repurposing on Twitter https://ojs.aaai.org/index.php/ICWSM/article/view/22139 We present the first in-depth and large-scale study of misleading repurposing, in which a malicious user changes the identity of their social media account via, among other things, changes to the profile attributes in order to use the account for a new purpose while retaining their followers. We propose a definition for the behavior and a methodology that uses supervised learning on data mined from the Internet Archive's Twitter Stream Grab to flag repurposed accounts. We found over 100,000 accounts that may have been repurposed. Of those, 28% were removed from the platform after 2 years, thereby confirming their inauthenticity. We also characterize repurposed accounts and found that they are more likely to be repurposed after a period of inactivity and deleting old tweets. We also provide evidence that adversaries target accounts with high follower counts to repurpose, and some make them have high follower counts by participating in follow-back schemes. The results we present have implications for the security and integrity of social media platforms, for data science studies in how historical data is considered, and for society at large in how users can be deceived about the popularity of an opinion. The data and the code is available at https://github.com/tugrulz/MisleadingRepurposing. Tuğrulcan Elmas, Rebekah Overdorf, Karl Aberer Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22139 Fri, 02 Jun 2023 00:00:00 -0700 Scope of Pre-trained Language Models for Detecting Conflicting Health Information https://ojs.aaai.org/index.php/ICWSM/article/view/22140 An increasing number of people now rely on online platforms to meet their health information needs. Thus identifying inconsistent or conflicting textual health information has become a safety-critical task. Health advice data poses a unique challenge where information that is accurate in the context of one diagnosis can be conflicting in the context of another. For example, people suffering from diabetes and hypertension often receive conflicting health advice on diet. This motivates the need for technologies which can provide contextualized, user-specific health advice. A crucial step towards contextualized advice is the ability to compare health advice statements and detect if and how they are conflicting. This is the task of health conflict detection (HCD). Given two pieces of health advice, the goal of HCD is to detect and categorize the type of conflict. It is a challenging task, as (i) automatically identifying and categorizing conflicts requires a deeper understanding of the semantics of the text, and (ii) the amount of available data is quite limited. In this study, we are the first to explore HCD in the context of pre-trained language models. We find that DeBERTa-v3 performs best with a mean F1 score of 0.68 across all experiments. We additionally investigate the challenges posed by different conflict types and how synthetic data improves a model's understanding of conflict-specific semantics. Finally, we highlight the difficulty in collecting real health conflicts and propose a human-in-the-loop synthetic data augmentation approach to expand existing HCD datasets. Our HCD training dataset is over 2x bigger than the existing HCD dataset and is made publicly available on Github. Joseph Gatto, Madhusudan Basak, Sarah Masud Preum Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22140 Fri, 02 Jun 2023 00:00:00 -0700 Author as Character and Narrator: Deconstructing Personal Narratives from the r/AmITheAsshole Reddit Community https://ojs.aaai.org/index.php/ICWSM/article/view/22141 In the r/AmITheAsshole subreddit, people anonymously share first person narratives that contain some moral dilemma or conflict and ask the community to judge who is at fault (i.e., who is "the asshole"). These first person narratives are, in general, a unique storytelling domain where the author is not only the narrator (the person telling the story) but is also a character (the person living the story) and, thus, the author has two distinct voices presented in the story. In this study, we identify linguistic and narrative features associated with the author as the character or as a narrator. We use these features to answer the following questions: (1) what makes an asshole character and (2) what makes an asshole narrator? We extract both Author-as-Character features (e.g., demographics, narrative event chain, and emotional arc) and Author-as-Narrator features (i.e., the style and emotion of the story as a whole) in order to identify which aspects of the narrative are correlated with the final moral judgment. Our work shows that "assholes" as Characters frame themselves as lacking agency with a more positive personal arc, while "assholes" as Narrators will tell emotional and opinionated stories. Salvatore Giorgi, Ke Zhao, Alexander H. Feng, Lara J. Martin Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22141 Fri, 02 Jun 2023 00:00:00 -0700 Google the Gatekeeper: How Search Components Affect Clicks and Attention https://ojs.aaai.org/index.php/ICWSM/article/view/22142 The contemporary Google Search Engine Results Page (SERP) supplements classic blue hyperlinks with complex components. These components produce tensions between searchers, 3rd-party websites, and Google itself over clicks and attention. In this study, we examine 12 SERP components from two categories: (1) extracted results (e.g., featured-snippets) and (2) Google Services (e.g., shopping-ads) to determine their effect on peoples’ behavior. We measure behavior with two variables: (1) click- through rate (CTR) to Google’s own domains versus 3rd-party domains and (2) time spent on the SERP. We apply causal inference methods to an ecologically valid trace dataset comprising 477,485 SERPs from 1,756 participants. We find that multiple components substantially increase CTR to Google domains, while others decrease CTR and increase time on the SERP. These findings may inform efforts to regulate the design of powerful intermediary platforms like Google. Jeffrey Gleason, Desheng Hu, Ronald E. Robertson, Christo Wilson Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22142 Fri, 02 Jun 2023 00:00:00 -0700 Understanding and Detecting Hateful Content Using Contrastive Learning https://ojs.aaai.org/index.php/ICWSM/article/view/22143 The spread of hate speech and hateful imagery on the Web is a significant problem that needs to be mitigated to improve our Web experience. This work contributes to research efforts to detect and understand hateful content on the Web by undertaking a multimodal analysis of Antisemitism and Islamophobia on 4chan’s /pol/ using OpenAI’s CLIP. This large pre-trained model uses the Contrastive Learning paradigm. We devise a methodology to identify a set of Antisemitic and Islamophobic hateful textual phrases using Google’s Perspective API and manual annotations. Then, we use OpenAI’s CLIP to identify images that are highly similar to our Antisemitic/Islamophobic textual phrases. By running our methodology on a dataset that includes 66M posts and 5.8M images shared on 4chan’s /pol/ for 18 months, we detect 173K posts containing 21K Antisemitic/Islamophobic images and 246K posts that include 420 hateful phrases. Among other things, we find that we can use OpenAI’s CLIP model to detect hateful content with an accuracy score of 0.81 (F1 score = 0.54). By comparing CLIP with two baselines proposed by the literature, we find that CLIP outperforms them, in terms of accuracy, precision, and F1 score, in detecting Antisemitic/Islamophobic images. Also, we find that Antisemitic/Islamophobic imagery is shared in a similar number of posts on 4chan’s /pol/ compared to Antisemitic/Islamophobic textual phrases, highlighting the need to design more tools for detecting hateful imagery. Finally, we make available (upon request) a dataset of 246K posts containing 420 Antisemitic/Islamophobic phrases and 21K likely Antisemitic/Islamophobic images (automatically detected by CLIP) that can assist researchers in further understanding Antisemitism and Islamophobia. Felipe González-Pizarro, Savvas Zannettou Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22143 Fri, 02 Jun 2023 00:00:00 -0700 SciLander: Mapping the Scientific News Landscape https://ojs.aaai.org/index.php/ICWSM/article/view/22144 The COVID-19 pandemic has fueled the spread of misinformation on social media and the Web as a whole. The phenomenon dubbed `infodemic' has taken the challenges of information veracity and trust to new heights by massively introducing seemingly scientific and technical elements into misleading content. Despite the existing body of work on modeling and predicting misinformation, the coverage of very complex scientific topics with inherent uncertainty and an evolving set of findings, such as COVID-19, provides many new challenges that are not easily solved by existing tools. To address these issues, we introduce SciLander, a method for learning representations of news sources reporting on science-based topics. We extract four heterogeneous indicators for the sources; two generic indicators that capture (1) the copying of news stories between sources, and (2) the use of the same terms to mean different things (semantic shift), and two scientific indicators that capture (1) the usage of jargon and (2) the stance towards specific citations. We use these indicators as signals of source agreement, sampling pairs of positive (similar) and negative (dissimilar) samples, and combine them in a unified framework to train unsupervised news source embeddings with a triplet margin loss objective. We evaluate our method on a novel COVID-19 dataset containing nearly 1M news articles from 500 sources spanning a period of 18 months since the beginning of the pandemic in 2020. Our results show that the features learned by our model outperform state-of-the-art baseline methods on the task of news veracity classification. Furthermore, a clustering analysis suggests that the learned representations encode information about the reliability, political leaning, and partisanship bias of these sources. Maurício Gruppi, Panayiotis Smeros, Sibel Adalı, Carlos Castillo, Karl Aberer Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22144 Fri, 02 Jun 2023 00:00:00 -0700 A Data Fusion Framework for Multi-Domain Morality Learning https://ojs.aaai.org/index.php/ICWSM/article/view/22145 Language models can be trained to recognize the moral sentiment of text, creating new opportunities to study the role of morality in human life. As interest in language and morality has grown, several ground truth datasets with moral annotations have been released. However, these datasets vary in the method of data collection, domain, topics, instructions for annotators, etc. Simply aggregating such heterogeneous datasets during training can yield models that fail to generalize well. We describe a data fusion framework for training on multiple heterogeneous datasets that improve performance and generalizability. The model uses domain adversarial training to align the datasets in feature space and a weighted loss function to deal with label shift. We show that the proposed framework achieves state-of-the-art performance in different datasets compared to prior works in morality inference. Siyi Guo, Negar Mokhberian, Kristina Lerman Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22145 Fri, 02 Jun 2023 00:00:00 -0700 Representing and Determining Argumentative Relevance in Online Discussions: A General Approach https://ojs.aaai.org/index.php/ICWSM/article/view/22146 Understanding an online argumentative discussion is essential for understanding users' opinions on a topic and their underlying reasoning. A key challenge in determining completeness and persuasiveness of argumentative discussions is to assess how arguments under a topic are connected in a logical and coherent manner. Online argumentative discussions, in contrast to essays or face-to-face communication, challenge techniques for judging argument relevance because online discussions involve multiple participants and often exhibit incoherence in reasoning and inconsistencies in writing style. We define relevance as the logical and topical connections between small texts representing argument fragments in online discussions. We provide a corpus comprising pairs of sentences, labeled with argumentative relevance between the sentences in each pair. We propose a computational approach relying on content reduction and a Siamese neural network architecture for modeling argumentative connections and determining argumentative relevance between texts. Experimental results indicate that our approach is effective in measuring relevance between arguments, and outperforms strong and well-adopted baselines. Further analysis demonstrates the benefit of using our argumentative relevance encoding on a downstream task, predicting how impactful an online comment is to certain topic, comparing to encoding that does not consider logical connection. Zhen Guo, Munindar P. Singh Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22146 Fri, 02 Jun 2023 00:00:00 -0700 The Morbid Realities of Social Media: An Investigation into the Narratives Shared by the Deceased Victims of COVID-19 https://ojs.aaai.org/index.php/ICWSM/article/view/22147 Social media platforms have had considerable impact on the real world especially during the Covid-19 pandemic. Problematic narratives related to Covid-19 might have caused significant impact on the population specifically due to its association with dangerous beliefs such as anti-vaccination and Covid denial. In this work, we study a unique dataset of Facebook posts by users who shared and believed in such narratives before succumbing to Covid-19 often resulting in death. We aim to characterize the dominant themes and sources present in the victim's posts along with identifying the role of the platform in handling deadly narratives. Our analysis reveals the overwhelming politicization of Covid-19 through the prevalence of anti-government themes propagated by right-wing political and media ecosystem. Furthermore, we highlight the efforts of Facebook's implementation of soft moderation actions intended to warn users of misinformation. Results from this study bring insights into the responsibility of political elites in shaping public discourse and the platform's role in dampening the reach of harmful narratives. Hussam Habib, Rishab Nithyanand Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22147 Fri, 02 Jun 2023 00:00:00 -0700 Motif-Based Exploratory Data Analysis for State-Backed Platform Manipulation on Twitter https://ojs.aaai.org/index.php/ICWSM/article/view/22148 State-backed platform manipulation (SBPM) on Twitter has been a prominent public issue since the 2016 US election cycle. Identifying and characterizing users on Twitter as belonging to a state-backed campaign is an important part of mitigating their influence. In this paper, we propose a novel time series feature grounded in social science to characterize dynamic user networks on Twitter. We introduce a classification approach, motif functional data analysis (MFDA), that captures the evolution of motifs in temporal networks, which is a useful feature for analyzing malign influence. We evaluate MFDA on data from known SBPM campaigns on Twitter and representative authentic data and compare performance to other classification methods. To further leverage our dynamic feature, we use the changes in network structure captured by motifs to help uncover real-world events using anomaly detection. Khuzaima Hameed, Rob Johnston, Brent Younce, Minh Tang, Alyson Wilson Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22148 Fri, 02 Jun 2023 00:00:00 -0700 Happenstance: Utilizing Semantic Search to Track Russian State Media Narratives about the Russo-Ukrainian War on Reddit https://ojs.aaai.org/index.php/ICWSM/article/view/22149 In the buildup to and in the weeks following the Russian Federation’s invasion of Ukraine, Russian state media outlets output torrents of misleading and outright false information. In this work, we study this coordinated information campaign in order to understand the most prominent state media narratives touted by the Russian government to English-speaking audiences. To do this, we first perform sentence-level topic analysis using the large-language model MPNet on articles published by ten different pro-Russian propaganda websites including the new Russian “fact-checking” website waronfakes.com. Within this ecosystem, we show that smaller websites like katehon.com were highly effective at publishing topics that were later echoed by other Russian sites. After analyzing this set of Russian information narratives, we then analyze their correspondence with narratives and topics of discussion on r/Russia and 10 other political subreddits. Using MPNet and a semantic search algorithm, we map these subreddits’ comments to the set of topics extracted from our set of Russian websites, finding that 39.6% of r/Russia comments corresponded to narratives from pro-Russian propaganda websites compared to 8.86% on r/politics. Hans W. A. Hanley, Deepak Kumar, Zakir Durumeric Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22149 Fri, 02 Jun 2023 00:00:00 -0700 "A Special Operation": A Quantitative Approach to Dissecting and Comparing Different Media Ecosystems’ Coverage of the Russo-Ukrainian War https://ojs.aaai.org/index.php/ICWSM/article/view/22150 The coverage of the Russian invasion of Ukraine has varied widely between Western, Russian, and Chinese media ecosystems with propaganda, disinformation, and narrative spins present in all three. By utilizing the normalized pointwise mutual information metric, differential sentiment analysis, word2vec models, and partially labeled Dirichlet allocation, we present a quantitative analysis of the differences in coverage amongst these three news ecosystems. We find that while the Western press outlets have focused on the military and humanitarian aspects of the war, Russian media have focused on the purported justifications for the “special military operation” such as the presence in Ukraine of “bio-weapons” and “neo-nazis”, and Chinese news media have concentrated on the conflict’s diplomatic and economic consequences. Detecting the presence of several Russian disinformation narratives in the articles of several Chinese media outlets, we finally measure the degree to which Russian media has influenced Chinese coverage across Chinese outlets’ news articles, Weibo accounts, and Twitter accounts. Our analysis indicates that since the Russian invasion of Ukraine, Chinese state media outlets have increasingly cited Russian outlets as news sources and spread Russian disinformation narratives. Hans W. A. Hanley, Deepak Kumar, Zakir Durumeric Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22150 Fri, 02 Jun 2023 00:00:00 -0700 The Geography of Facebook Groups in the United States https://ojs.aaai.org/index.php/ICWSM/article/view/22151 We present a de-identified and aggregated dataset based on geographical patterns of Facebook Groups usage and demonstrate its association with measures of social capital. The dataset is aggregated at United States county level. Established spatial measures of social capital are known to vary across US counties. Their availability and recency depends on running costly surveys. We examine to what extent a dataset based on usage patterns of Facebook Groups, which can be generated at regular intervals, could be used as a partial proxy by capturing local online associations. We identify four main latent factors that distinguish Facebook group engagement by county, obtained by exploratory factor analysis. The first captures small and private groups, dense with friendship connections. The second captures very local and small groups. The third captures non-local, large, public groups, with more age mixing. The fourth captures partially local groups of medium to large size. Only two of these factors, the first and third, correlate with offline community level social capital measures, while the second and fourth do not. Together and individually, the factors are predictive of offline social capital measures, even controlling for various demographic attributes of the counties. To our knowledge this is the first systematic test of the association between offline regional social capital and patterns of online community engagement in the same regions. By making the dataset available to the research community, we hope to contribute to the ongoing studies in social capital. Amaç Herdağdelen, Lada Adamic, Bogdan State Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22151 Fri, 02 Jun 2023 00:00:00 -0700 Quotatives Indicate Decline in Objectivity in U.S. Political News https://ojs.aaai.org/index.php/ICWSM/article/view/22152 According to journalistic standards, direct quotes should be attributed to sources with objective quotatives such as ``said'' and ``told,'' since nonobjective quotatives, e.g., ``argued'' and ``insisted,'' would influence the readers' perception of the quote and the quoted person. In this paper, we analyze the adherence to this journalistic norm to study trends in objectivity in political news across U.S. outlets of different ideological leanings. We ask: 1) How has the usage of nonobjective quotatives evolved? 2) How do news outlets use nonobjective quotatives when covering politicians of different parties? To answer these questions, we developed a dependency-parsing-based method to extract quotatives and applied it to Quotebank, a web-scale corpus of attributed quotes, obtaining nearly 7 million quotes, each enriched with the quoted speaker's political party and the ideological leaning of the outlet that published the quote. We find that, while partisan outlets are the ones that most often use nonobjective quotatives, between 2013 and 2020, the outlets that increased their usage of nonobjective quotatives the most were ``moderate'' centrist news outlets (around 0.6 percentage points, or 20% in relative percentage over seven years). Further, we find that outlets use nonobjective quotatives more often when quoting politicians of the opposing ideology (e.g., left-leaning outlets quoting Republicans) and that this ``quotative bias'' is rising at a swift pace, increasing up to 0.5 percentage points, or 25% in relative percentage, per year. These findings suggest an overall decline in journalistic objectivity in U.S. political news. Tiancheng Hu, Manoel Horta Ribeiro, Robert West, Andreas Spitz Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22152 Fri, 02 Jun 2023 00:00:00 -0700 Information Retention in the Multi-Platform Sharing of Science https://ojs.aaai.org/index.php/ICWSM/article/view/22153 The public interest in accurate scientific communication, underscored by recent public health crises, highlights how content often loses critical pieces of information as it spreads online. However, multi-platform analyses of this phenomenon remain limited due to challenges in data collection. Collecting mentions of research tracked by Altmetric LLC, we examine information retention in the over 4 million online posts referencing 9,765 of the most-mentioned scientific articles across blog sites, Facebook, news sites, Twitter, and Wikipedia. To do so, we present a burst-based framework for examining online discussions about science over time and across different platforms. To measure information retention, we develop a keyword-based computational measure comparing an online post to the scientific article's abstract. We evaluate our measure using ground truth data labeled by within field experts. We highlight three main findings: first, we find a strong tendency towards low levels of information retention, following a distinct trajectory of loss except when bursts of attention begin in social media. Second, platforms show significant differences in information retention. Third, sequences involving more platforms tend to be associated with higher information retention. These findings highlight a strong tendency towards information loss over time---posing a critical concern for researchers, policymakers, and citizens alike---but suggest that multi-platform discussions may improve information retention overall. Sohyeon Hwang, Emőke-Ágnes Horvát, Daniel M. Romero Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22153 Fri, 02 Jun 2023 00:00:00 -0700 Measuring Belief Dynamics on Twitter https://ojs.aaai.org/index.php/ICWSM/article/view/22154 There is growing concern about misinformation and the role online media plays in social polarization. Analyzing belief dynamics is one way to enhance our understanding of these problems. Existing analytical tools, such as sur-vey research or stance detection, lack the power to corre-late contextual factors with population-level changes in belief dynamics. In this exploratory study, I present the Belief Landscape Framework, which uses data about people’s professed beliefs in an online setting to measure belief dynamics with more temporal granularity than previous methods. I apply the approach to conversations about climate change on Twitter and provide initial validation by comparing the method’s output to a set of hypotheses drawn from the literature on dynamic systems. My analysis indicates that the method is relatively robust to different parameter settings, and results suggest that 1) there are many stable configurations of belief on the polarizing issue of climate change and 2) that people move in predictable ways around these points. The method paves the way for more powerful tools that can be used to understand how the modern digital media eco-system impacts collective belief dynamics and what role misinformation plays in that process. Joshua Introne Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22154 Fri, 02 Jun 2023 00:00:00 -0700 Lady and the Tramp Nextdoor: Online Manifestations of Real-World Inequalities in the Nextdoor Social Network https://ojs.aaai.org/index.php/ICWSM/article/view/22155 From health to education, income impacts a huge range of life choices. Earlier research has leveraged data from online social networks to study precisely this impact. In this paper, we ask the opposite question: do different levels of income result in different online behaviors? We demonstrate it does. We present the first large-scale study of Nextdoor, a popular location-based social network. We collect 2.6 Million posts from 64,283 neighborhoods in the United States and 3,325 neighborhoods in the United Kingdom, to examine whether online discourse reflects the income and income inequality of a neighborhood. We show that posts from neighborhoods with different incomes indeed differ, e.g. richer neighborhoods have a more positive sentiment and discuss crimes more, even though their actual crime rates are much lower. We then show that user-generated content can predict both income and inequality. We train multiple machine learning models and predict both income (R2=0.841) and inequality (R2=0.77). Waleed Iqbal, Vahid Ghafouri, Gareth Tyson, Guillermo Suarez-Tangil, Ignacio Castro Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22155 Fri, 02 Jun 2023 00:00:00 -0700 Weakly Supervised Learning for Analyzing Political Campaigns on Facebook https://ojs.aaai.org/index.php/ICWSM/article/view/22156 Social media platforms are currently the main channel for political messaging, allowing politicians to target specific demographics and adapt based on their reactions. However, making this communication transparent is challenging, as the messaging is tightly coupled with its intended audience and often echoed by multiple stakeholders interested in advancing specific policies. Our goal in this paper is to take a first step towards understanding these highly decentralized settings. We propose a weakly supervised approach to identify the stance and issue of political ads on Facebook and analyze how political campaigns use some kind of demographic targeting by location, gender, or age. Furthermore, we analyze the temporal dynamics of the political ads on election polls. Tunazzina Islam, Shamik Roy, Dan Goldwasser Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22156 Fri, 02 Jun 2023 00:00:00 -0700 Online Emotions during the Storming of the U.S. Capitol: Evidence from the Social Media Network Parler https://ojs.aaai.org/index.php/ICWSM/article/view/22157 The storming of the U.S. Capitol on January 6, 2021 has led to the killing of 5 people and is widely regarded as an attack on democracy. The storming was largely coordinated through social media networks such as Twitter and "Parler". Yet little is known regarding how users interacted on Parler during the storming of the Capitol. In this work, we examine the emotion dynamics on Parler during the storming with regard to heterogeneity across time and users. For this, we segment the user base into different groups (e.g., Trump supporters and QAnon supporters). We use affective computing to infer the emotions in content, thereby allowing us to provide a comprehensive assessment of online emotions. Our evaluation is based on a large-scale dataset from Parler, comprising of 717,300 posts from 144,003 users. We find that the user base responded to the storming of the Capitol with an overall negative sentiment. Akin to this, Trump supporters also expressed a negative sentiment and high levels of unbelief. In contrast to that, QAnon supporters did not express a more negative sentiment during the storming. We further provide a cross-platform analysis and compare the emotion dynamics on Parler and Twitter. Our findings point at a comparatively less negative response to the incidents on Parler compared to Twitter accompanied by higher levels of disapproval and outrage. Our contribution to research is three-fold: (1) We identify online emotions that were characteristic of the storming; (2) we assess emotion dynamics across different user groups on Parler; (3) we compare the emotion dynamics on Parler and Twitter. Thereby, our work offers important implications for actively managing online emotions to prevent similar incidents in the future. Johannes Jakubik, Michael Vössing, Nicolas Pröllochs, Dominik Bär, Stefan Feuerriegel Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22157 Fri, 02 Jun 2023 00:00:00 -0700 Effect of Feedback on Drug Consumption Disclosures on Social Media https://ojs.aaai.org/index.php/ICWSM/article/view/22158 Deaths due to drug overdose in the US have doubled in the last decade. Drug-related content on social media has also exploded in the same time frame. The pseudo-anonymous nature of social media platforms enables users to discourse about taboo and sometimes illegal topics like drug consumption. User-generated content (UGC) about drugs on social media can be used as an online proxy to detect offline drug consumption. UGC also gets exposed to the praise and criticism of the community. Law of effect proposes that positive reinforcement on an experience can incentivize the users to engage in the experience repeatedly. Therefore, we hypothesize that positive community feedback on a user's online drug consumption disclosure will increase the probability of the user doing an online drug consumption disclosure post again. To this end, we collect data from 10 drug-related subreddits. First, we build a deep learning model to classify UGC as indicative of drug consumption offline or not, and analyze the extent of such activities. Further, we use matching-based causal inference techniques to unravel community feedback's effect on users' future drug consumption behavior. We discover that 84% of posts and 55% comments on drug-related subreddits indicate real-life drug consumption. Users who get positive feedback generate up to two times more drugs consumption content in the future. Finally, we conducted an anonymous user study on drug-related subreddits to compare members' opinions with our experimental findings and show that user tends to underestimate the effect community peers can have on their decision to interact with drugs. Hitkul Jangra, Rajiv Shah, Ponnurangam Kumaraguru Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22158 Fri, 02 Jun 2023 00:00:00 -0700 SexWEs: Domain-Aware Word Embeddings via Cross-Lingual Semantic Specialisation for Chinese Sexism Detection in Social Media https://ojs.aaai.org/index.php/ICWSM/article/view/22159 The goal of sexism detection is to mitigate negative online content targeting certain gender groups of people. However, the limited availability of labeled sexism-related datasets makes it problematic to identify online sexism for low-resource languages. In this paper, we address the task of automatic sexism detection in social media for one low-resource language -- Chinese. Rather than collecting new sexism data or building cross-lingual transfer learning models, we develop a cross-lingual domain-aware semantic specialisation system in order to make the most of existing data. Semantic specialisation is a technique for retrofitting pre-trained distributional word vectors by integrating external linguistic knowledge (such as lexico-semantic relations) into the specialised feature space. To do this, we leverage semantic resources for sexism from a high-resource language (English) to specialise pre-trained word vectors in the target language (Chinese) to inject domain knowledge. We demonstrate the benefit of our sexist word embeddings (SexWEs) specialised by our framework via intrinsic evaluation of word similarity and extrinsic evaluation of sexism detection. Compared with other specialisation approaches and Chinese baseline word vectors, our SexWEs shows an average score improvement of 0.033 and 0.064 in both intrinsic and extrinsic evaluations, respectively. The ablative results and visualisation of SexWEs also prove the effectiveness of our framework on retrofitting word vectors in low-resource languages. Aiqi Jiang, Arkaitz Zubiaga Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22159 Fri, 02 Jun 2023 00:00:00 -0700 Retweet-BERT: Political Leaning Detection Using Language Features and Information Diffusion on Social Networks https://ojs.aaai.org/index.php/ICWSM/article/view/22160 Estimating the political leanings of social media users is a challenging and ever more pressing problem given the increase in social media consumption. We introduce Retweet-BERT, a simple and scalable model to estimate the political leanings of Twitter users. Retweet-BERT leverages the retweet network structure and the language used in users' profile descriptions. Our assumptions stem from patterns of networks and linguistics homophily among people who share similar ideologies. Retweet-BERT demonstrates competitive performance against other state-of-the-art baselines, achieving 96%-97% macro-F1 on two recent Twitter datasets (a COVID-19 dataset and a 2020 United States presidential elections dataset). We also perform manual validation to validate the performance of Retweet-BERT on users not in the training data. Finally, in a case study of COVID-19, we illustrate the presence of political echo chambers on Twitter and show that it exists primarily among right-leaning users. Our code is open-sourced and our data is publicly available. Julie Jiang, Xiang Ren, Emilio Ferrara Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22160 Fri, 02 Jun 2023 00:00:00 -0700 Images, Emotions, and Credibility: Effect of Emotional Facial Expressions on Perceptions of News Content Bias and Source Credibility in Social Media https://ojs.aaai.org/index.php/ICWSM/article/view/22161 Images are an indispensable part of the news we consume. Highly emotional images from mainstream and misinformation sources can greatly influence our trust in the news. We present two studies on the effects of emotional facial images on users' perception of bias in news content and the credibility of sources. In study 1, we investigate the impact of repeated exposure to content with images containing positive or negative facial expressions on users’ judgements of source credibility and bias. In study 2, we focus on sources' systematic emotional portrayal of specific politicians. Our results show the presence of negative (angry) facial emotions can lead to perceptions of higher bias in content. We also find that systematic portrayal negative portrayal of different politicians leads to lower perceptions of source credibility. These results highlight how implicit visual propositions manifested by emotions in facial expressions might have a substantial effect on our trust in news. Alireza Karduni, Ryan Wesslen, Douglas Markant, Wenwen Dou Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22161 Fri, 02 Jun 2023 00:00:00 -0700 InfluencerRank: Discovering Effective Influencers via Graph Convolutional Attentive Recurrent Neural Networks https://ojs.aaai.org/index.php/ICWSM/article/view/22162 As influencers play considerable roles in social media marketing, companies increase the budget for influencer marketing. Hiring effective influencers is crucial in social influencer marketing, but it is challenging to find the right influencers among hundreds of millions of social media users. In this paper, we propose InfluencerRank that ranks influencers by their effectiveness based on their posting behaviors and social relations over time. To represent the posting behaviors and social relations, the graph convolutional neural networks are applied to model influencers with heterogeneous networks during different historical periods. By learning the network structure with the embedded node features, InfluencerRank can derive informative representations for influencers at each period. An attentive recurrent neural network finally distinguishes highly effective influencers from other influencers by capturing the knowledge of the dynamics of influencer representations over time. Extensive experiments have been conducted on an Instagram dataset that consists of 18,397 influencers with their 2,952,075 posts published within 12 months. The experimental results demonstrate that InfluencerRank outperforms existing baseline methods. An in-depth analysis further reveals that all of our proposed features and model components are beneficial to discover effective influencers. Seungbae Kim, Jyun-Yu Jiang, Jinyoung Han, Wei Wang Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22162 Fri, 02 Jun 2023 00:00:00 -0700 Popular Support for Balancing Equity and Efficiency in Resource Allocation: A Case Study in Online Advertising to Increase Welfare Program Awareness https://ojs.aaai.org/index.php/ICWSM/article/view/22163 Algorithmically optimizing the provision of limited resources is commonplace across domains from healthcare to lending. Optimization can lead to efficient resource allocation, but, if deployed without additional scrutiny, can also exacerbate inequality. Little is known about popular preferences regarding acceptable efficiency-equity trade-offs, making it difficult to design algorithms that are responsive to community needs and desires. Here we examine this trade-off and concomitant preferences in the context of GetCalFresh, an online service that streamlines the application process for California’s Supplementary Nutrition Assistance Program (SNAP, formerly known as food stamps). GetCalFresh runs online advertisements to raise awareness of their multilingual SNAP application service. We first demonstrate that when ads are optimized to garner the most enrollments per dollar, a disproportionately small number of Spanish speakers enroll due to relatively higher costs of non-English language advertising. Embedding these results in a survey (N = 1,532) of a diverse set of Americans, we find broad popular support for valuing equity in addition to efficiency: respondents generally preferred reducing total enrollments to facilitate increased enrollment of Spanish speakers. These results buttress recent calls to reevaluate the efficiency-centric paradigm popular in algorithmic resource allocation. Allison Koenecke, Eric Giannella, Robb Willer, Sharad Goel Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22163 Fri, 02 Jun 2023 00:00:00 -0700 Personal History Affects Reference Points: A Case Study of Codeforces https://ojs.aaai.org/index.php/ICWSM/article/view/22164 Humans make decisions based on their internal value function, and its shape is known to be distorted and biased around a point, which the research community of behavior economics refers to as the reference point. People intensify activities that come to lie within the reach of their reference point, and abstain from acts that would incur losses once they've crossed the point. However, the impact of past experiences on decision making around the reference point has not been well studied. By analyzing a long series of user-level decisions gathered from a competitive programming website, we find that history has a clear impact on user's decision making around the reference point. Past experiences can strengthen, and sometimes weaken, the decision bias around the reference point. Experiences of past difficulties can strengthen the tendency towards loss aversion after achieving the reference point. When a person crosses a reference point for the first time, the cognitive decision bias is significant. However, repeating this crossing gradually weakens the effect. We also show the value of our insights in the task of predicting user behavior. Prediction models incorporating our insights may be used for motivating people to remain more active. Takeshi Kurashima, Tomoharu Iwata, Tomu Tominaga, Shuhei Yamamoto, Hiroyuki Toda, Kazuhisa Takemura Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22164 Fri, 02 Jun 2023 00:00:00 -0700 Large-Scale Demographic Inference of Social Media Users in a Low-Resource Scenario https://ojs.aaai.org/index.php/ICWSM/article/view/22165 Characterizing the demographics of social media users enables a diversity of applications, from better targeting of policy interventions to the derivation of representative population estimates of social phenomena. Achieving high performance with supervised learning, however, can be challenging as labeled data is often scarce. Alternatively, rule-based matching strategies provide well-grounded information but only offer partial coverage over users. It is unclear, therefore, what features and models are best suited to maximize coverage over a large set of users while maintaining high performance. In this paper, we develop a cost-effective strategy for large-scale demographic inference by relying on minimal labeling efforts. We combine a name-matching strategy with graph-based methods to map the demographics of 1.8 million Nigerian Twitter users. Specifically, we compare a purely graph-based propagation model, namely Label Propagation (LP), with Graph Convolutional Networks (GCN), a graph model that also incorporates node features based on user content. We find that both models largely outperform supervised learning approaches based purely on user content that lack graph information. Notably, we find that LP achieves comparable performance to the state-of-the-art GCN while providing greater interpretability at a lower computing cost. Moreover, performance does not significantly improve with the addition of user-specific features, such as textual representations of user tweets and user geolocation. Leveraging our data collection effort, we describe the demographic composition of Nigerian Twitter finding that it is a highly non-uniform sample of the general Nigerian population. Karim Lasri, Manuel Tonneau, Haaya Naushan, Niyati Malhotra, Ibrahim Farouq, Víctor Orozco-Olvera, Samuel Fraiberger Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22165 Fri, 02 Jun 2023 00:00:00 -0700 Associative Inference Can Increase People’s Susceptibility to Misinformation https://ojs.aaai.org/index.php/ICWSM/article/view/22166 Associative inference is an adaptive, constructive process of memory that allows people to link related information to make novel connections. We conducted three online human-subjects experiments investigating participants’ susceptibility to associatively inferred misinformation and its interaction with their cognitive ability and how news articles were presented. In each experiment, participants completed recognition and perceived accuracy rating tasks for the snippets of news articles in a tweet format across two phases. At Phase 1, participants viewed real news only. At Phase 2, participants viewed both real and fake news. Critically, we varied whether the fake news at Phase 2 was inferred from (i.e., associative inference), associated with (i.e., association only), or irrelevant to (i.e., control) the corresponding real news pairs at Phase 1. Both recognition and perceived accuracy results showed that participants in the associative inference condition were more susceptible to fake news than those in the other conditions. Furthermore, hashtags embedded within the tweets made the obtained effects evident only for the participants of higher cognitive ability. Our findings reveal that associative inference can be a basis for individuals’ susceptibility to misinformation, especially for those of higher cognitive ability. We conclude by discussing the implications of our results for understanding and mitigating misinformation on social media platforms. Sian Lee, Haeseung Seo, Dongwon Lee, Aiping Xiong Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22166 Fri, 02 Jun 2023 00:00:00 -0700 Beyond Discrete Genres: Mapping News Items onto a Multidimensional Framework of Genre Cues https://ojs.aaai.org/index.php/ICWSM/article/view/22167 In the contemporary media landscape, with the vast and diverse supply of news, it is increasingly challenging to study such an enormous amount of items without a standardized framework. Although attempts have been made to organize and compare news items on the basis of news values, news genres receive little attention, especially the genres in a news consumer’s perception. Yet, perceived news genres serve as an essential component in exploring how news has developed, as well as a precondition for understanding media effects. We approach this concept by conceptualizing and operationalizing a non-discrete framework for mapping news items in terms of genre cues. As a starting point, we propose a preliminary set of dimensions consisting of “factuality” and “formality”. To automatically analyze a large amount of news items, we deliver two computational models for predicting news sentences in terms of the said two dimensions. Such predictions could then be used for locating news items within our framework. This proposed approach that positions news items upon a multidimensional grid helps deepening our insight into the evolving nature of news genres. Zilin Lin, Kasper Welbers, Susan Vermeer, Damian Trilling Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22167 Fri, 02 Jun 2023 00:00:00 -0700 "Learn the Facts about COVID-19": Analyzing the Use of Warning Labels on TikTok Videos https://ojs.aaai.org/index.php/ICWSM/article/view/22168 During the COVID-19 pandemic, health-related misinformation and harmful content shared online had a significant adverse effect on society. In an attempt to mitigate this adverse effect, mainstream social media platforms like Facebook, Twitter, and TikTok employed soft moderation interventions (i.e., warning labels) on potentially harmful posts. Such interventions aim to inform users about the post's content without removing it, hence easing the public's concerns about censorship and freedom of speech. Despite the recent popularity of these moderation interventions, as a research community, we lack empirical analyses aiming to uncover how these warning labels are used in the wild, particularly during challenging times like the COVID-19 pandemic. In this work, we analyze the use of warning labels on TikTok, focusing on COVID-19 videos. First, we construct a set of 26 COVID-19 related hashtags, and then we collect 41K videos that include those hashtags in their description. Second, we perform a quantitative analysis on the entire dataset to understand the use of warning labels on TikTok. Then, we perform an in-depth qualitative study, using thematic analysis, on 222 COVID-19 related videos to assess the content and the connection between the content and the warning labels. Our analysis shows that TikTok broadly applies warning labels on TikTok videos, likely based on hashtags included in the description (e.g., 99% of the videos that contain #coronavirus have warning labels). More worrying is the addition of COVID-19 warning labels on videos where their actual content is not related to COVID-19 (23% of the cases in a sample of 143 English videos that are not related to COVID-19). Finally, our qualitative analysis on a sample of 222 videos shows that 7.7% of the videos share misinformation/harmful content and do not include warning labels, 37.3% share benign information and include warning labels, and that 35% of the videos that share misinformation/harmful content (and need a warning label) are made for fun. Our study demonstrates the need to develop more accurate and precise soft moderation systems, especially on a platform like TikTok that is extremely popular among people of younger age. Chen Ling, Krishna P. Gummadi, Savvas Zannettou Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22168 Fri, 02 Jun 2023 00:00:00 -0700 Improving Mental Health Classifier Generalization with Pre-diagnosis Data https://ojs.aaai.org/index.php/ICWSM/article/view/22169 Recent work has shown that classifiers for depression detection often fail to generalize to new datasets. Most NLP models for this task are built on datasets that use textual reports of a depression diagnosis (e.g., statements on social media) to identify diagnosed users; this approach allows for collection of large-scale datasets, but leads to poor generalization to out-of-domain data. Notably, models tend to capture features that typify direct discussion of mental health rather than more subtle indications of depression symptoms. In this paper, we explore the hypothesis that building classifiers using exclusively social media posts from before a user's diagnosis will lead to less reliance on shortcuts and better generalization. We test our classifiers on a dataset that is based on an external survey rather than textual self-reports, and find that using pre-diagnosis data for training yields improved performance with many types of classifiers. Yujian Liu, Laura Biester, Rada Mihalcea Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22169 Fri, 02 Jun 2023 00:00:00 -0700 Team Resilience under Shock: An Empirical Analysis of GitHub Repositories during Early COVID-19 Pandemic https://ojs.aaai.org/index.php/ICWSM/article/view/22170 While many organizations have shifted to working remotely during the COVID-19 pandemic, how the remote workforce and the remote teams are influenced by and would respond to this and future shocks remain largely unknown. Software developers have relied on remote collaborations long before the pandemic, working in virtual teams (GitHub repositories). The dynamics of these repositories through the pandemic provide a unique opportunity to understand how remote teams react under shock. This work presents a systematic analysis. We measure the overall effect of the early pandemic on public GitHub repositories by comparing their sizes and productivity with the counterfactual outcomes forecasted as if there were no pandemic. We find that the productivity level and the number of active members of these teams vary significantly during different periods of the pandemic. We then conduct a finer-grained investigation and study the heterogeneous effects of the shock on individual teams. We find that the resilience of a team is highly correlated to certain properties of the team before the pandemic. Through a bootstrapped regression analysis, we reveal which types of teams are robust or fragile to the shock. Xuan Lu, Wei Ai, Yixin Wang, Qiaozhu Mei Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22170 Fri, 02 Jun 2023 00:00:00 -0700 Contextualizing Online Conversational Networks https://ojs.aaai.org/index.php/ICWSM/article/view/22171 Online social connections occur within a specific conversational context. Prior work in network analysis of social media data attempts to contextualize data through filtering. We propose a method of contextualizing online conversational connections automatically and illustrate this method with Twitter data. Specifically, we detail a graph neural network model capable of representing tweets in a vector space based on their text, hashtags, URLs, and neighboring tweets. Once tweets are represented, clusters of tweets uncover conversational contexts. We apply our method to a dataset with 4.5 million tweets discussing the 2020 US election. We find that even filtered data contains many different conversational contexts, with users engaging in multiple conversations. While users engage in multiple conversations, the overlap between any two pairs of conversations tends to be only 30-40%, giving very different networks for different conversations. Even accounting for this variation, we show that the relative social status of users varies considerably across contexts, with tau=0.472 on average. Our findings imply that standard network analysis on social media data can be unreliable in the face of multiple conversational contexts. Thomas Magelinski, Kathleen M. Carley Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22171 Fri, 02 Jun 2023 00:00:00 -0700 Comfort Foods and Community Connectedness: Investigating Diet Change during COVID-19 Using YouTube Videos on Twitter https://ojs.aaai.org/index.php/ICWSM/article/view/22172 Unprecedented lockdowns at the start of the COVID-19 pandemic have drastically changed the routines of millions of people, potentially impacting important health-related behaviors. In this study, we use YouTube videos embedded in tweets about diet, exercise and fitness posted before and during COVID-19 to investigate the influence of the pandemic lockdowns on diet and nutrition. In particular, we examine the nutritional profile of the foods mentioned in the transcript, description and title of each video in terms of six macronutrients (protein, energy, fat, sodium, sugar, and saturated fat). These macronutrient values were further linked to demographics to assess if there are specific effects on those potentially having insufficient access to healthy sources of food. Interrupted time series analysis revealed a considerable shift in the aggregated macronutrient scores before and during COVID-19. In particular, whereas areas with lower incomes showed decrease in energy, fat, and saturated fat, those with higher percentage of African Americans showed an elevation in sodium. Word2Vec word similarities and odds ratio analysis suggested a shift from popular diets and lifestyle bloggers before the lockdowns to the interest in a variety of healthy foods, communal sharing of quick and easy recipes, as well as a new emphasis on comfort foods. To the best of our knowledge, this work is novel in terms of linking attention signals in tweets, content of videos, their nutrients profile, and aggregate demographics of the users. The insights made possible by this combination of resources are important for monitoring the secondary health effects of social distancing, and informing social programs designed to alleviate these effects. Yelena Mejova, Lydia Manikonda Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22172 Fri, 02 Jun 2023 00:00:00 -0700 Authority without Care: Moral Values behind the Mask Mandate Response https://ojs.aaai.org/index.php/ICWSM/article/view/22173 Face masks are one of the cheapest and most effective non-pharmaceutical interventions available against airborne diseases such as COVID-19. Unfortunately, they have been met with resistance by a substantial fraction of the populace, especially in the U.S. In this study, we uncover the latent moral values that underpin the response to the mask mandate, and paint them against the country's political backdrop. We monitor the discussion about masks on Twitter, which involves almost 600k users in a time span of 7 months. By using a combination of graph mining, natural language processing, topic modeling, content analysis, and time series analysis, we characterize the responses to the mask mandate of both those in favor and against them. We base our analysis on the theoretical frameworks of Moral Foundation Theory and Hofstede's cultural dimensions. Our results show that, while the anti-mask stance is associated with a conservative political leaning, the moral values expressed by its adherents diverge from the ones typically used by conservatives. In particular, the expected emphasis on the values of authority and purity is accompanied by an atypical dearth of in-group loyalty. We find that after the mandate, both pro- and anti-mask sides decrease their emphasis on care about others, and increase their attention on authority and fairness, further politicizing the issue. In addition, the mask mandate reverses the expression of Individualism-Collectivism between the two sides, with an increase of individualism in the anti-mask narrative, and a decrease in the pro-mask one. We argue that monitoring the dynamics of moral positioning is crucial for designing effective public health campaigns that are sensitive to the underlying values of the target audience. Yelena Mejova, Kyriaki Kalimeri, Gianmarco De Francisci Morales Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22173 Fri, 02 Jun 2023 00:00:00 -0700 Bridging Nations: Quantifying the Role of Multilinguals in Communication on Social Media https://ojs.aaai.org/index.php/ICWSM/article/view/22174 Social media enables the rapid spread of many kinds of information, from pop culture memes to social movements. However, little is known about how information crosses linguistic boundaries. We apply causal inference techniques on the European Twitter network to quantify the structural role and communication influence of multilingual users in cross-lingual information exchange. Overall, multilinguals play an essential role; posting in multiple languages increases betweenness centrality by 13%, and having a multilingual network neighbor increases monolinguals’ odds of sharing domains and hashtags from another language 16-fold and 4-fold, respectively. We further show that multilinguals have a greater impact on diffusing information is less accessible to their monolingual compatriots, such as information from far-away countries and content about regional politics, nascent social movements, and job opportunities. By highlighting information exchange across borders, this work sheds light on a crucial component of how information and ideas spread around the world. Julia Mendelsohn, Sayan Ghosh, David Jurgens, Ceren Budak Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22174 Fri, 02 Jun 2023 00:00:00 -0700 Information Operations in Turkey: Manufacturing Resilience with Free Twitter Accounts https://ojs.aaai.org/index.php/ICWSM/article/view/22175 Following the 2016 US elections Twitter launched their Information Operations (IO) hub where they archive account activity connected to state linked information operations. In June 2020, Twitter took down and released a set of accounts linked to Turkey's ruling political party (AKP). We investigate these accounts in the aftermath of the takedown to explore whether AKP-linked operations are ongoing and to understand the strategies they use to remain resilient to disruption. We collect live accounts that appear to be part of the same network, ~30% of which have been suspended by Twitter since our collection. We create a BERT-based classifier that shows similarity between these two networks, develop a taxonomy to categorize these accounts, find direct sequel accounts between the Turkish takedown and the live accounts, and find evidence that Turkish IO actors deliberately construct their network to withstand large-scale shutdown by utilizing explicit and implicit signals of coordination. We compare our findings from the Turkish operation to Russian and Chinese IO on Twitter and find that Turkey's IO utilizes a unique group structure to remain resilient. Our work highlights the fundamental imbalance between IO actors quickly and easily creating free accounts and the social media platforms spending significant resources on detection and removal, and contributes novel findings about Turkish IO on Twitter. Maya Merhi, Sarah Rajtmajer, Dongwon Lee Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22175 Fri, 02 Jun 2023 00:00:00 -0700 "This Is Fake News": Characterizing the Spontaneous Debunking from Twitter Users to COVID-19 False Information https://ojs.aaai.org/index.php/ICWSM/article/view/22176 False information spreads on social media, and fact-checking is a potential countermeasure. However, there is a severe shortage of fact-checkers; an efficient way to scale fact-checking is desperately needed, especially in pandemics like COVID-19. In this study, we focus on spontaneous debunking by social media users, which has been missed in existing research despite its indicated usefulness for fact-checking and countering false information. Specifically, we characterize the tweets with false information, or fake tweets, that tend to be debunked and Twitter users who often debunk fake tweets. For this analysis, we create a comprehensive dataset of responses to fake tweets, annotate a subset of them, and build a classification model for detecting debunking behaviors. We find that most fake tweets are left undebunked, spontaneous debunking is slower than other forms of responses, and spontaneous debunking exhibits partisanship in political topics. These results provide actionable insights into utilizing spontaneous debunking to scale conventional fact-checking, thereby supplementing existing research from a new perspective. Kunihiro Miyazaki, Takayuki Uchiba, Kenji Tanaka, Jisun An, Haewoon Kwak, Kazutoshi Sasahara Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22176 Fri, 02 Jun 2023 00:00:00 -0700 Echo Tunnels: Polarized News Sharing Online Runs Narrow but Deep https://ojs.aaai.org/index.php/ICWSM/article/view/22177 Online social platforms afford users vast digital spaces to share and discuss current events. However, scholars have concerns both over their role in segregating information exchange into ideological echo chambers, and over evidence that these echo chambers are nonetheless over-stated. In this work, we investigate news-sharing patterns across the entirety of Reddit and find that the platform appears polarized macroscopically, especially in politically right-leaning spaces. On closer examination, however, we observe that the majority of this effect originates from small, hyper-partisan segments of the platform accounting for a minority of news shared. We further map the temporal evolution of polarized news sharing and uncover evidence that, in addition to having grown drastically over time, polarization in hyper-partisan communities also began much earlier than 2016 and is resistant to Reddit's largest moderation event. Our results therefore suggest that social polarized news sharing runs narrow but deep online. Rather than being guided by the general prevalence or absence of echo chambers, we argue that platform policies are better served by measuring and targeting the communities in which ideological segregation is strongest. Lillio Mok, Michael Inzlicht, Ashton Anderson Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22177 Fri, 02 Jun 2023 00:00:00 -0700 The Chance of Winning Election Impacts on Social Media Strategy https://ojs.aaai.org/index.php/ICWSM/article/view/22178 Social media has been a paramount arena for election campaigns for political actors. While many studies have been paying attention to the political campaigns related to partisanship, politicians also can conduct different campaigns according to their chances of winning. Leading candidates, for example, do not behave the same as fringe candidates in their elections, and vice versa. We, however, know little about this difference in social media political campaign strategies according to their odds in elections. We tackle this problem by analyzing candidates' tweets in terms of users, topics, and sentiment of replies. Our study finds that, as their chances of winning increase, candidates narrow the targets they communicate with, from people in general to the electrical districts and specific persons (verified accounts or accounts with many followers). Our study brings new insights into the candidates' campaign strategies through the analysis based on the novel perspective of the candidate's electoral situation. Taichi Murayama, Akira Matsui, Kunihiro Miyazaki, Yasuko Matsubara, Yasushi Sakurai Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22178 Fri, 02 Jun 2023 00:00:00 -0700 BotBuster: Multi-Platform Bot Detection Using a Mixture of Experts https://ojs.aaai.org/index.php/ICWSM/article/view/22179 Despite rapid development, current bot detection models still face challenges in dealing with incomplete data and cross-platform applications. In this paper, we propose BotBuster, a social bot detector built with the concept of a mixture of experts approach. Each expert is trained to analyze a portion of account information, e.g. username, and are combined to estimate the probability that the account is a bot. Experiments on 10 Twitter datasets show that BotBuster outperforms popular bot-detection baselines (avg F1=73.54 vs avg F1=45.12). This is accompanied with F1=60.04 on a Reddit dataset and F1=60.92 on an external evaluation set. Further analysis shows that only 36 posts is required for a stable bot classification. Investigation shows that bot post features have changed across the years and can be difficult to differentiate from human features, making bot detection a difficult and ongoing problem. Lynnette Hui Xian Ng, Kathleen M. Carley Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22179 Fri, 02 Jun 2023 00:00:00 -0700 "Dummy Grandpa, Do You Know Anything?": Identifying and Characterizing Ad Hominem Fallacy Usage in the Wild https://ojs.aaai.org/index.php/ICWSM/article/view/22180 Today, participating in discussions on online forums is extremely commonplace and these discussions have started rendering a strong influence on the overall opinion of online users. Naturally, twisting the flow of the argument can have a strong impact on the minds of naive users, which in the long run might have socio-political ramifications, for example, winning an election or spreading targeted misinformation. Thus, these platforms are potentially highly vulnerable to malicious players who might act individually or as a cohort to breed fallacious arguments with a motive to sway public opinion. Ad hominem arguments are one of the most effective forms of such fallacies. Although a simple fallacy, it is effective enough to sway public debates in offline world and can be used as a precursor to shutting down the voice of opposition by slander. In this work, we take a first step in shedding light on the usage of ad hominem fallacies in the wild. First, we build a powerful ad hominem detector based on transformer architecture with high accuracy (F1 more than 83%, showing a significant improvement over prior work), even for datasets for which annotated instances constitute a very small fraction. We then used our detector on 265k arguments collected from the online debate forum – CreateDebate. Our crowdsourced surveys validate our in-the-wild predictions on CreateDebate data (94% match with manual annotation). Our analysis revealed that a surprising 31.23% of CreateDebate content contains ad hominem fallacy, and a cohort of highly active users post significantly more ad hominem to suppress opposing views. Then, our temporal analysis revealed that ad hominem argument usage increased significantly since the 2016 US Presidential election, not only for topics like Politics, but also for Science and Law. We conclude by discussing important implications of our work to detect and defend against ad hominem fallacies. Utkarsh Patel, Animesh Mukherjee, Mainack Mondal Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22180 Fri, 02 Jun 2023 00:00:00 -0700 On the Relation between Opinion Change and Information Consumption on Reddit https://ojs.aaai.org/index.php/ICWSM/article/view/22181 While much attention has been devoted to the causes of opinion change, little is known about its consequences. Our study moves a first step in this direction by looking at Reddit, and in particular to the subreddit r/ChangeMyView, a community dedicated to debating one’s own opinions on a wide array of topics. We analyze changes in online information consumption behavior that arise after a self-reported opinion change, by looking at the participation to a set of sociopolitical communities. We find that people who self-report an opinion change are significantly more likely to change their future participation in a specific subset of those communities. Specifically, there is a significant association (Pearson r = 0.46) between using propaganda-like language in a community and the increase in chances of leaving it. Comparable results (Pearson r = 0.39) hold for the opposite direction, i.e., joining these same communities. In addition, the textual content of the post associated with opinion change is indicative of which communities will be joined or left: a predictive model based only on the text of this post can pinpoint these communities with an average precision@5 of 0.20. Our results establish a link between opinion change and information consumption, and highlight how online propagandistic communities act as a first gateway to internalize a shift in one’s sociopolitical opinion. Flavio Petruzzellis, Francesco Bonchi, Gianmarco De Francisci Morales, Corrado Monti Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22181 Fri, 02 Jun 2023 00:00:00 -0700 This Sample Seems to Be Good Enough! Assessing Coverage and Temporal Reliability of Twitter’s Academic API https://ojs.aaai.org/index.php/ICWSM/article/view/22182 Because of its willingness to share data with academia and industry, Twitter has been the primary social media platform for scientific research as well as for consulting businesses and governments in the last decade. In recent years, a series of publications have studied and criticized Twitter's APIs and Twitter has partially adapted its existing data streams. The newest Twitter API for Academic Research allows to "access Twitter's real-time and historical public data with additional features and functionality that support collecting more precise, complete, and unbiased datasets. The main new feature of this API is the possibility of accessing the full archive of all historic Tweets. In this article, we will take a closer look at the Academic API and will try to answer two questions. First, are the datasets collected with the Academic API complete? Secondly, since Twitter's Academic API delivers historic Tweets as represented on Twitter at the time of data collection, we need to understand how much data is lost over time due to Tweet and account removal from the platform. Our work shows evidence that Twitter's Academic API can indeed create (almost) complete samples of Twitter data based on a wide variety of search terms. We also provide evidence that Twitter's data endpoint v2 delivers better samples than the previously used endpoint v1.1. Furthermore, collecting Tweets with the Academic API at the time of studying a phenomenon rather than creating local archives of stored Tweets, allows for a straightforward way of following Twitter's developer agreement. Finally, we will also discuss technical artifacts and implications of the Academic API. We hope that our work can add another layer of understanding of Twitter data collections leading to more reliable studies of human behavior via social media data. Jürgen Pfeffer, Angelina Mooseder, Jana Lasser, Luca Hammer, Oliver Stritzel, David Garcia Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22182 Fri, 02 Jun 2023 00:00:00 -0700 The Geometry of Misinformation: Embedding Twitter Networks of Users Who Spread Fake News in Geometrical Opinion Spaces https://ojs.aaai.org/index.php/ICWSM/article/view/22183 To understand why internet users spread fake news online, many studies have focused on individual drivers, such as cognitive skills, media literacy, or demographics. Recent findings have also shown the role of complex socio-political dynamics, highlighting that political polarization and ideologies are closely linked to a propensity to participate in the dissemination of fake news. Most of the existing empirical studies have focused on the US example by exploiting the self-reported or solicited positioning of users on a dichotomous scale opposing liberals with conservatives. Yet, left-right polarization alone is insufficient to study socio-political dynamics when considering non binary and multi-dimensional party systems, in which relevant ideological stances must be characterized in additional dimensions, relating for example to opposition to elites, government, political parties or mainstream media. In this article we leverage ideological embeddings of Twitter networks in France in multi-dimensional opinions spaces, where dimensions stand for attitudes towards different issues, and we trace the positions of users who shared articles that were rated as misinformation by fact-checkers. In multi-dimensional settings, and in contrast with the US, opinion dimensions capturing attitudes towards elites are more predictive of whether a user shares misinformation. Most users sharing misinformation hold salient anti-elite sentiments and, among them, more so those with radical left- and right-leaning stances. Our results reinforce the importance of enriching one-dimensional left-right analyses, showing that other ideological dimensions, such as anti-elite sentiment, are critical when characterizing users who spread fake news. This lends support to emerging accounts of social drivers of misinformation through political polarization, but also stresses the role of the entanglement between fake news, anti-elite polarization, and the role of scientific authorities in public debate. Pedro Ramaciotti Morales, Manon Berriche, Jean-Philippe Cointet Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22183 Fri, 02 Jun 2023 00:00:00 -0700 Spillover of Antisocial Behavior from Fringe Platforms: The Unintended Consequences of Community Banning https://ojs.aaai.org/index.php/ICWSM/article/view/22184 Online platforms face pressure to keep their communities civil and respectful. Thus, banning problematic online communities from mainstream platforms is often met with enthusiastic public reactions. However, this policy can lead users to migrate to alternative fringe platforms with lower moderation standards and may reinforce antisocial behaviors. As users of these communities often remain co-active across mainstream and fringe platforms, antisocial behaviors may spill over onto the mainstream platform. We study this possible spillover by analyzing 70,000 users from three banned communities that migrated to fringe platforms: r/The_Donald, r/GenderCritical, and r/Incels. Using a difference-in-differences design, we contrast co-active users with matched counterparts to estimate the causal effect of fringe platform participation on users' antisocial behavior on Reddit. Our results show that participating in the fringe communities increases users' toxicity on Reddit (as measured by Perspective API) and involvement with subreddits similar to the banned community---which often also breach platform norms. The effect intensifies with time and exposure to the fringe platform. In short, we find evidence for a spillover of antisocial behavior from fringe platforms onto Reddit via co-participation. Giuseppe Russo, Luca Verginer, Manoel Horta Ribeiro, Giona Casiraghi Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22184 Fri, 02 Jun 2023 00:00:00 -0700 Cross-Lingual and Cross-Domain Crisis Classification for Low-Resource Scenarios https://ojs.aaai.org/index.php/ICWSM/article/view/22185 Social media data has emerged as a useful source of timely information about real-world crisis events. One of the main tasks related to the use of social media for disaster management is the automatic identification of crisis-related messages. Most of the studies on this topic have focused on the analysis of data for a particular type of event in a specific language. This limits the possibility of generalizing existing approaches because models cannot be directly applied to new types of events or other languages. In this work, we study the task of automatically classifying messages that are related to crisis events by leveraging cross-language and cross-domain labeled data. Our goal is to make use of labeled data from high-resource languages to classify messages from other (low-resource) languages and/or of new (previously unseen) types of crisis situations. For our study we consolidated from the literature a large unified dataset containing multiple crisis events and languages. Our empirical findings show that it is indeed possible to leverage data from crisis events in English to classify the same type of event in other languages, such as Spanish and Italian (80.0% F1-score). Furthermore, we achieve good performance for the cross-domain task (80.0% F1-score) in a cross-lingual setting. Overall, our work contributes to improving the data scarcity problem that is so important for multilingual crisis classification. In particular, mitigating cold-start situations in emergency events, when time is of essence. Cinthia Sánchez, Hernan Sarmiento, Andres Abeliuk, Jorge Pérez, Barbara Poblete Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22185 Fri, 02 Jun 2023 00:00:00 -0700 How Much User Context Do We Need? Privacy by Design in Mental Health NLP Applications https://ojs.aaai.org/index.php/ICWSM/article/view/22186 Clinical NLP tasks such as mental health assessment from text, must take social constraints into account - the performance maximization must be constrained by the utmost importance of guaranteeing privacy of user data. Consumer protection regulations, such as GDPR, generally handle privacy by restricting data availability, such as requiring to limit user data to 'what is necessary' for a given purpose. In this work, we reason that providing stricter formal privacy guarantees, while increasing the volume of user data in the model, in most cases increases benefit for all parties involved, especially for the user. We demonstrate our arguments on two existing suicide risk assessment datasets of Twitter and Reddit posts. We present the first analysis juxtaposing user history length and differential privacy budgets and elaborate how modeling additional user context enables utility preservation while maintaining acceptable user privacy guarantees. Ramit Sawhney, Atula Neerkaje, Ivan Habernal, Lucie Flek Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22186 Fri, 02 Jun 2023 00:00:00 -0700 Effects of Algorithmic Trend Promotion: Evidence from Coordinated Campaigns in Twitter’s Trending Topics https://ojs.aaai.org/index.php/ICWSM/article/view/22187 In addition to more personalized content feeds, some leading social media platforms give a prominent role to content that is more widely popular. On Twitter, "trending topics" identify popular topics of conversation on the platform, thereby promoting popular content which users might not have otherwise seen through their network. Hence, "trending topics" potentially play important roles in influencing the topics users engage with on a particular day. Using two carefully constructed data sets from India and Turkey, we study the effects of a hashtag appearing on the trending topics page on the number of tweets produced with that hashtag. We specifically aim to answer the question: How many new tweeting using that hashtag appear because a hashtag is labeled as trending? We distinguish the effects of the trending topics page from network exposure and find there is a statistically significant, but modest, return to a hashtag being featured on trending topics. Analysis of the types of users impacted by trending topics shows that the feature helps less popular and new users to discover and spread content outside their network, which they otherwise might not have been able to do. Joseph Schlessinger, Kiran Garimella, Maurice Jakesch, Dean Eckles Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22187 Fri, 02 Jun 2023 00:00:00 -0700 Detecting Anti-vaccine Users on Twitter https://ojs.aaai.org/index.php/ICWSM/article/view/22188 Vaccine hesitancy, which has recently been driven by online narratives, significantly degrades the efficacy of vaccination strategies, such as those for COVID-19. Despite broad agreement in the medical community about the safety and efficacy of available vaccines, a large number of social media users continue to be inundated with false information about vaccines and are indecisive or unwilling to be vaccinated. The goal of this study is to better understand anti-vaccine sentiment by developing a system capable of automatically identifying the users responsible for spreading anti-vaccine narratives. We introduce a publicly available Python package capable of analyzing Twitter profiles to assess how likely that profile is to share anti-vaccine sentiment in the future. The software package is built using text embedding methods, neural networks, and automated dataset generation and is trained on several million tweets. We find this model can accurately detect anti-vaccine users up to a year before they tweet anti-vaccine hashtags or keywords. We also show examples of how text analysis helps us understand anti-vaccine discussions by detecting moral and emotional differences between anti-vaccine spreaders on Twitter and regular users. Our results will help researchers and policy-makers understand how users become anti-vaccine and what they discuss on Twitter. Policy-makers can utilize this information for better targeted campaigns that debunk harmful anti-vaccination myths. Matheus Schmitz, Goran Muric, Keith Burghardt Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22188 Fri, 02 Jun 2023 00:00:00 -0700 Cybersecurity Misinformation Detection on Social Media: Case Studies on Phishing Reports and Zoom’s Threat https://ojs.aaai.org/index.php/ICWSM/article/view/22189 Prior work has extensively studied misinformation related to news, politics, and health, however, misinformation can also be about technological topics. While less controversial, such misinformation can severely impact companies’ reputations and revenues, and users’ online experiences. Recently, social media has also been increasingly used as a novel source of knowledgebase for extracting timely and relevant security threats, which are fed to the threat intelligence systems for better performance. However, with possible campaigns spreading false security threats, these systems can become vulnerable to poisoning attacks. In this work, we proposed novel approaches for detecting misinformation about cybersecurity and privacy threats on social media, focusing on two topics with different types of misinformation: phishing websites and Zoom’s security & privacy threats. We developed a framework for detecting inaccurate phishing claims on Twitter. Using this framework, we could label about 9% of URLs and 22% of phishing reports as misinformation. We also proposed another framework for detecting misinformation related to Zoom’s security and privacy threats on multiple platforms. Our classifiers showed great performance with more than 98% accuracy. Employing these classifiers on the posts from Facebook, Instagram, Reddit, and Twitter, we found respectively that about 18%, 3%, 4%, and 3% of posts were misinformation. In addition, we studied the characteristics of misinformation posts, their authors, and their timelines, which helped us identify campaigns. Mohit Singhal, Nihal Kumarswamy, Shreyasi Kinhekar, Shirin Nilizadeh Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22189 Fri, 02 Jun 2023 00:00:00 -0700 Characterizing and Identifying Socially Shared Self-Descriptions in Product Reviews https://ojs.aaai.org/index.php/ICWSM/article/view/22190 Online e-commerce product reviews can be highly influential in a customer's decision-making processes. Reviews often describe personal experiences with a product and provide candid opinions about a product's pros and cons. In some cases, reviewers choose to share information about themselves, just as they might do in social platforms. These descriptions are a valuable source of information about who finds a product most helpful. Customers benefit from key insights about a product from people with their same interests and sellers might use the information to better serve their customers needs. In this work, we present a comprehensive look into voluntary self-descriptive information found in public customer reviews. We analyzed what people share about themselves and how this contributes to their product opinions. We developed a taxonomy of types of self-descriptions, and a machine-learned classification model of reviews according to this taxonomy. We present new quantitative findings, and a thematic study of the perceived purpose descriptions in reviews. Lu Sun, F. Maxwell Harper, Chia-Jung Lee, Vanessa Murdock, Barbara Poblete Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22190 Fri, 02 Jun 2023 00:00:00 -0700 Social Influence-Maximizing Group Recommendation https://ojs.aaai.org/index.php/ICWSM/article/view/22191 In this paper, we revisit the group recommendation problem, by taking into consideration the information diffusion in a social network, as one of the main criteria that must be maximised. While the well-known influence maximization problem has the objective to select k users (spread seeds) from a social network, so that a piece of information can spread to the largest possible number of people in the network, in our setting the seeds are known (given as a group), and we must decide which k items (pieces of information) should be recommended to them. Therefore, the recommended items should at the same time be the best match for that group's preferences, and have the potential to spread as much as possible in an underlying diffusion network, to which the group members (the seeds) belong. This problem is directly motivated by group recommendation scenarios where social networking is an inherent dimension that must be taken into account when assessing the potential impact of a certain recommendation. We present the model and formulate the problem of influence-aware group recommendation as a multiple objective optimization problem. We then describe a greedy approach for this problem and we design an optimisation approach, by adapting the top-k algorithms NRA and TA. We evaluate all these methods experimentally, in three different recommendation scenarios, for movie, micro-blog and book recommendations, based on real-world datasets from Flixster, Twitter, and Douban respectively. Unsurprisingly, with the introduction of information diffusion as an optimization criterion for group recommendation, the recommendation problem becomes more complex. However, we show that our algorithms enable spread efficiency without loss of recommendation precision, under reasonable latency. Yangke Sun, Bogdan Cautis, Silviu Maniu Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22191 Fri, 02 Jun 2023 00:00:00 -0700 Top-Down Influence? Predicting CEO Personality and Risk Impact from Speech Transcripts https://ojs.aaai.org/index.php/ICWSM/article/view/22192 How much does a CEO’s personality impact the performanceof their company? Management theory posits a great influence, but it is difficult to show empirically—there is a lack of publicly available self-reported personality data of top managers. Instead, we propose a text-based personality regressor based on crowd-sourced Myers–Briggs Type Indicator (MBTI) assessments. The ratings have a high internal and external validity and can be predicted with moderate to strong correlations for three out of four dimensions. Providing evidence for the upper echelons theory, we demonstrate that the predicted CEO personalities have explanatory power of financial risk. Kilian Theil, Dirk Hovy, Heiner Stuckenschmidt Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22192 Fri, 02 Jun 2023 00:00:00 -0700 Identifying Influential Brokers on Social Media from Social Network Structure https://ojs.aaai.org/index.php/ICWSM/article/view/22193 Identifying influencers in a given social network has become an important research problem for various applications, including accelerating the spread of information in viral marketing and preventing the spread of fake news and rumors. The literature contains a rich body of studies on identifying influential source spreaders who can spread their own messages to many other nodes. In contrast, the identification of influential brokers who can spread other nodes' messages to many nodes has not been fully explored. Theoretical and empirical studies suggest that involvement of both influential source spreaders and brokers is a key to facilitating large-scale information diffusion cascades. Therefore, this paper explores ways to identify influential brokers from a given social network. By using three social media datasets, we investigate the characteristics of influential brokers by comparing them with influential source spreaders and central nodes obtained from centrality measures. Our results show that (i) most of the influential source spreaders are not influential brokers (and vice versa) and (ii) the overlap between central nodes and influential brokers is small (less than 15%) in Twitter datasets. We also tackle the problem of identifying influential brokers from centrality measures and node embeddings, and we examine the effectiveness of social network features in the broker identification task. Our results show that (iii) although a single centrality measure cannot characterize influential brokers well, prediction models using node embedding features achieve F1 scores of 0.35--0.68, suggesting the effectiveness of social network features for identifying influential brokers. Sho Tsugawa, Kohei Watabe Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22193 Fri, 02 Jun 2023 00:00:00 -0700 A Multi-Task Model for Sentiment Aided Stance Detection of Climate Change Tweets https://ojs.aaai.org/index.php/ICWSM/article/view/22194 Climate change has become one of the biggest challenges of our time. Social media platforms such as Twitter play an important role in raising public awareness and spreading knowledge about the dangers of the current climate crisis. With the increasing number of campaigns and communication about climate change through social media, the information could create more awareness and reach the general public and policy makers. However, these Twitter communications lead to polarization of beliefs, opinion-dominated ideologies, and often a split into two communities of climate change deniers and believers. In this paper, we propose a framework that helps identify denier statements on Twitter and thus classifies the stance of the tweet into one of the two attitudes towards climate change (denier/believer). The sentimental aspects of Twitter data on climate change are deeply rooted in general public attitudes toward climate change. Therefore, our work focuses on learning two closely related tasks: Stance Detection and Sentiment Analysis of climate change tweets. We propose a multi-task framework that performs stance detection (primary task) and sentiment analysis (auxiliary task) simultaneously. The proposed model incorporates the feature-specific and shared-specific attention frameworks to fuse multiple features and learn the generalized features for both tasks. The experimental results show that the proposed framework increases the performance of the primary task, i.e., stance detection by benefiting from the auxiliary task, i.e., sentiment analysis compared to its uni-modal and single-task variants. Apoorva Upadhyaya, Marco Fisichella, Wolfgang Nejdl Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22194 Fri, 02 Jun 2023 00:00:00 -0700 An Open-Source Cultural Consensus Approach to Name-Based Gender Classification https://ojs.aaai.org/index.php/ICWSM/article/view/22195 Name-based gender classification has enabled hundreds of otherwise infeasible scientific studies of gender. Yet, the lack of standardization, reliance on paid services, understudied limitations, and conceptual debates cast a shadow over many applications. To address these problems we develop and evaluate an ensemble-based open-source method built on publicly available data of empirical name-gender associations. Our method integrates 36 distinct sources—spanning over 150 countries and more than a century—via a meta-learning algorithm inspired by Cultural Consensus Theory (CCT). We also construct a taxonomy with which names themselves can be classified. We find that our method's performance is competitive with paid services and that our method, and others, approach the upper limits of performance; we show that conditioning estimates on additional metadata (e.g. cultural context), further combining methods, or collecting additional name-gender association data is unlikely to meaningfully improve performance. This work definitively shows that name-based gender classification can be a reliable part of scientific research and provides a pair of tools, a classification method and a taxonomy of names, that realize this potential. Ian Van Buskirk, Aaron Clauset, Daniel B. Larremore Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22195 Fri, 02 Jun 2023 00:00:00 -0700 Reddit in the Time of COVID https://ojs.aaai.org/index.php/ICWSM/article/view/22196 When the COVID-19 pandemic hit, much of life moved online. Platforms of all types reported surges of activity, and people remarked on the various important functions that online platforms suddenly fulfilled. However, researchers lack a rigorous understanding of the pandemic's impacts on social platforms---and whether they were temporary or long-lasting. We present a conceptual framework for studying the large-scale evolution of social platforms and apply it to the study of Reddit's history, with a particular focus on the COVID-19 pandemic. We study platform evolution through two key dimensions: structure vs. content and macro- vs. micro-level analysis. Structural signals help us quantify how much behavior changed, while content analysis clarifies exactly how it changed. Applying these at the macro-level illuminates platform-wide changes, while at the micro-level we study impacts on individual users. We illustrate the value of this approach by showing the extraordinary and ordinary changes Reddit went through during the pandemic. First, we show that typically when rapid growth occurs, it is driven by a few concentrated communities and within a narrow slice of language use. However, Reddit's growth throughout COVID-19 was spread across disparate communities and languages. Second, all groups were equally affected in their change of interest, but veteran users tended to invoke COVID-related language more than newer users. Third, the new wave of users that arrived following COVID-19 was fundamentally different from previous cohorts of new users in terms of interests, activity, and likelihood of staying active on the platform. These findings provide a more rigorous understanding of how an online platform changed during the global pandemic. Veniamin Veselovsky, Ashton Anderson Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22196 Fri, 02 Jun 2023 00:00:00 -0700 Identifying and Characterizing Behavioral Classes of Radicalization within the QAnon Conspiracy on Twitter https://ojs.aaai.org/index.php/ICWSM/article/view/22197 Social media provide a fertile ground where conspiracy theories and radical ideas can flourish, reach broad audiences, and sometimes lead to hate or violence beyond the online world itself. QAnon represents a notable example of a political conspiracy that started out on social media but turned mainstream, in part due to public endorsement by influential political figures. Nowadays, QAnon conspiracies often appear in the news, are part of political rhetoric, and are espoused by significant swaths of people in the United States. It is therefore crucial to understand how such a conspiracy took root online, and what led so many social media users to adopt its ideas. In this work, we propose a framework that exploits both social interaction and content signals to uncover evidence of user radicalization or support for QAnon. Leveraging a large dataset of 240M tweets collected in the run-up to the 2020 US Presidential election, we define and validate a multivariate metric of radicalization. We use that to separate users in distinct, naturally-emerging, classes of behaviors associated with radicalization processes, from self-declared QAnon supporters to hyper-active conspiracy promoters. We also analyze the impact of Twitter's moderation policies on the interactions among different classes: we discover aspects of moderation that succeed, yielding a substantial reduction in the endorsement received by hyperactive QAnon accounts. But we also uncover where moderation fails, showing how QAnon content amplifiers are not deterred or affected by the Twitter intervention. Our findings refine our understanding of online radicalization processes, reveal effective and ineffective aspects of moderation, and call for the need to further investigate the role social media play in the spread of conspiracies. Emily L. Wang, Luca Luceri, Francesco Pierri, Emilio Ferrara Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22197 Fri, 02 Jun 2023 00:00:00 -0700 AnnoBERT: Effectively Representing Multiple Annotators’ Label Choices to Improve Hate Speech Detection https://ojs.aaai.org/index.php/ICWSM/article/view/22198 Supervised machine learning approaches often rely on a "ground truth" label. However, obtaining one label through majority voting ignores the important subjectivity information in tasks such hate speech detection. Existing neural network models principally regard labels as categorical variables, while ignoring the semantic information in diverse label texts. In this paper, we propose AnnoBERT, a first-of-its-kind architecture integrating annotator characteristics and label text with a transformer-based model to detect hate speech, with unique representations based on each annotator's characteristics via Collaborative Topic Regression (CTR) and integrate label text to enrich textual representations. During training, the model associates annotators with their label choices given a piece of text; during evaluation, when label information is not available, the model predicts the aggregated label given by the participating annotators by utilising the learnt association. The proposed approach displayed an advantage in detecting hate speech, especially in the minority class and edge cases with annotator disagreement. Improvement in the overall performance is the largest when the dataset is more label-imbalanced, suggesting its practical value in identifying real-world hate speech, as the volume of hate speech in-the-wild is extremely small on social media, when compared with normal (non-hate) speech. Through ablation studies, we show the relative contributions of annotator embeddings and label text to the model performance, and tested a range of alternative annotator embeddings and label text combinations. Wenjie Yin, Vibhor Agarwal, Aiqi Jiang, Arkaitz Zubiaga, Nishanth Sastry Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22198 Fri, 02 Jun 2023 00:00:00 -0700 Unique in What Sense? Heterogeneous Relationships between Multiple Types of Uniqueness and Popularity in Music https://ojs.aaai.org/index.php/ICWSM/article/view/22199 How does our society appreciate the uniqueness of cultural products? This fundamental puzzle has intrigued scholars in many fields, including psychology, sociology, anthropology, and marketing. It has been theorized that cultural products that balance familiarity and novelty are more likely to become popular. However, a cultural product's novelty is typically multifaceted. This paper uses songs as a case study to study the multiple facets of uniqueness and their relationship with success. We first unpack the multiple facets of a song's novelty or uniqueness and, next, measure its impact on a song's popularity. We employ a series of statistical models to study the relationship between a song's popularity and novelty associated with its lyrics, chord progressions, or audio properties. Our analyses performed on a dataset of over fifty thousand songs find a consistently negative association between all types of song novelty and popularity. Overall we found a song's lyrics uniqueness to have the most significant association with its popularity. However, audio uniqueness was the strongest predictor of a song's popularity, conditional on the song's genre. We further found the theme and repetitiveness of a song's lyrics to mediate the relationship between the song's popularity and novelty. Broadly, our results contradict the "optimal distinctiveness theory'' (balance between novelty and familiarity) and call for an investigation into the multiple dimensions along which a cultural product's uniqueness could manifest. Yulin Yu, Pui Yin Cheung, Yong-Yeol Ahn, Paramveer S. Dhillon Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22199 Fri, 02 Jun 2023 00:00:00 -0700 Conversation Modeling to Predict Derailment https://ojs.aaai.org/index.php/ICWSM/article/view/22200 Conversations among online users sometimes derail, i.e., break down into personal attacks. Derailment interferes with the healthy growth of communities in cyberspace. The ability to predict whether an ongoing conversation will derail could provide valuable advance, even real-time, insight to both interlocutors and moderators. Prior approaches predict conversation derailment retrospectively without the ability to forestall the derailment proactively. Some existing works attempt to make dynamic predictions as the conversation develops, but fail to incorporate multisource information, such as conversational structure and distance to derailment. We propose a hierarchical transformer-based framework that combines utterance-level and conversation-level information to capture fine-grained contextual semantics. We propose a domain-adaptive pretraining objective to unite conversational structure information and a multitask learning scheme to leverage the distance from each utterance to derailment. An evaluation of our framework on two conversation derailment datasets shows an improvement in F1 score for the prediction of derailment. These results demonstrate the effectiveness of incorporating multisource information for predicting the derailment of a conversation. Jiaqing Yuan, Munindar P. Singh Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22200 Fri, 02 Jun 2023 00:00:00 -0700 Minority Stress Experienced by LGBTQ Online Communities during the COVID-19 Pandemic https://ojs.aaai.org/index.php/ICWSM/article/view/22201 The COVID-19 pandemic has disproportionately impacted the lives of minorities, such as members of the LGBTQ community (lesbian, gay, bisexual, transgender, and queer) due to pre-existing social disadvantages and health disparities. Although extensive research has been carried out on the impact of the COVID-19 pandemic on different aspects of the general population's lives, few studies are focused on the LGBTQ population. In this paper, we develop and evaluate two sets of machine learning classifiers using a pre-pandemic and a during-pandemic dataset to identify Twitter posts exhibiting minority stress, which is a unique pressure faced by the members of the LGBTQ population due to their sexual and gender identities. We demonstrate that our best pre- and during-pandemic models show strong and stable performance for detecting posts that contain minority stress. We investigate the linguistic differences in minority stress posts across pre- and during-pandemic periods. We find that anger words are strongly associated with minority stress during the COVID-19 pandemic. We explore the impact of the pandemic on the emotional states of the LGBTQ population by adopting propensity score-based matching to perform a causal analysis. The results show that the LGBTQ population have a greater increase in the usage of cognitive words and worsened observable attribute in the usage of positive emotion words than the group of the general population with similar pre-pandemic behavioral attributes. Our findings have implications for the public health domain and policy-makers to provide adequate support, especially with respect to mental health, to the LGBTQ population during future crises. Yunhao Yuan, Gaurav Verma, Barbara Keller, Talayeh Aledavood Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22201 Fri, 02 Jun 2023 00:00:00 -0700 How Circadian Rhythms Extracted from Social Media Relate to Physical Activity and Sleep https://ojs.aaai.org/index.php/ICWSM/article/view/22202 Circadian rhythm has been linked to both physical and mental health at an individual level in prior research. Such a link at population level has been long hypothesized but has never been tested, largely because of lack of data. To partly fix this literature gap, we need: a dataset on population-level circadian rhythms, a dataset on population-level health conditions, and strong associations between these two partly independent sets. Recent work has shown that affect on social media data relates to population-level circadian rhythms. Building upon that work, we extracted five circadian rhythm metrics from 6M Reddit posts across 18 major cities (for which the number of residents is highly correlated with the number of users), and paired them with three ground-truth health metrics (daily number of steps, sleep quantity, and sleep quality) extracted from 233K wearable users in these cities. We found that rhythms of online activity approximated sleeping patterns rather than, what the literature previously hypothesized, alertness levels. Despite that, we found that these rhythms, when computed in two specific times of the day (i.e., late at night and early morning), were still predictive of the three ground-truth health metrics: in general, healthier cities had morning spikes on social media, night dips, and expressions of positive affect. These results suggest that circadian rhythms on social media, if taken at two specific times of the day and operationalized with literature-driven metrics, can approximate the temporal evolution of people's shared underlying biological rhythm as it relates to physical activity (R2=0.492), sleep quantity (R2=0.765), and sleep quality (R2=0.624). Ke Zhou, Marios Constantinides, Daniele Quercia, Sanja Šćepanović Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22202 Fri, 02 Jun 2023 00:00:00 -0700 Who Is behind a Trend? Temporal Analysis of Interactions among Trend Participants on Twitter https://ojs.aaai.org/index.php/ICWSM/article/view/22203 Trends are a fundamental component of today's fast-evolving media landscape. Still, a lot of questions about who participates in such trends remain unanswered. Are trends driven by individual actors, or do interactions between actors reveal community structures? If so, do those structures change during the life cycle of a trend or between topically similar trends? In short: Who is behind a trend? This paper contributes to a better understanding of these questions and, in general, actor networks underlying trends on social media. As a case study, we leverage a large Twitter dataset from the EURO2020 soccer competition to detect and analyze topical trends. Our novel Gaussian fitting method allows separating trend life cycles into up- and down-trend components, as well as determining the duration of trends. An event-based evaluation proves good performance results. Given separate trend stages and topically similar trends at different points in time, we conduct a temporal analysis of the actor networks during trends. Our findings not only reveal a large overlap of participants between successive trends but also indicate large variations within a trend life cycle. Furthermore, actor networks seem to be centred around a small number of dominant users and communities. Those users also show large stability across similar trends over time. In contrast, temporally stable community structures are neither found within nor across topically similar trends. John Ziegler, Michael Gertz Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22203 Fri, 02 Jun 2023 00:00:00 -0700 Towards Generalization of Machine Learning Models: A Case Study of Arabic Sentiment Analysis https://ojs.aaai.org/index.php/ICWSM/article/view/22204 The abundance of social media data in the Arab world, specifically on Twitter, enabled companies and entities to exploit such rich and beneficial data that could be mined and used to extract important information, including sentiments and opinions of people towards a topic or a merchandise. However, with this plenitude comes the issue of producing models that are able to deliver consistent outcomes when tested within various contexts. Although model generalization has been thoroughly investigated in many fields, it has not been heavily investigated in the Arabic context. To address this gap, we investigate the generalization of models and data in Arabic with application to sentiment analysis, by performing a battery of experiments and building different models that are tested on five independent test sets to understand their performance when presented with unseen data. In doing so, we detail different techniques that improve the generalization of machine learning models in Arabic sentiment analysis, and share a large versatile dataset consisting of approximately 1.64M Arabic tweets and their corresponding sentiment to be used for future research. Our experiments concluded that the most consistent model is trained using a dataset labelled by a cascaded approach of two models, one that labels neutral tweets and another that identifies positive/negative tweets based on the Arabic emoji lexicon after class balancing. Both the BERT and the SVM models trained using the refined data achieve an average F-1 score of 0.62 and 0.60, and standard deviation of 0.06 and 0.04 respectively, when evaluated on five diverse test sets, outperforming other models by at least 17% relative gain in F-1. Based on our experiments, we share recommendations to improve model generalization for classification tasks. Samir Abdaljalil, Shaimaa Hassanein, Hamdy Mubarak, Ahmed Abdelali Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22204 Fri, 02 Jun 2023 00:00:00 -0700 A Multi-Platform Collection of Social Media Posts about the 2022 U.S. Midterm Elections https://ojs.aaai.org/index.php/ICWSM/article/view/22205 Social media are utilized by millions of citizens to discuss important political issues. Politicians use these platforms to connect with the public and broadcast policy positions. Therefore, data from social media has enabled many studies of political discussion. While most analyses are limited to data from individual platforms, people are embedded in a larger information ecosystem spanning multiple social networks. Here we describe and provide access to the Indiana University 2022 U.S. Midterms Multi-Platform Social Media Dataset (MEIU22), a collection of social media posts from Twitter, Facebook, Instagram, Reddit, and 4chan. MEIU22 links to posts about the midterm elections based on a comprehensive list of keywords and tracks the social media accounts of 1,011 candidates from October 1 to December 25, 2022. We also publish the source code of our pipeline to enable similar multi-platform research projects. Rachith Aiyappa, Matthew R. DeVerna, Manita Pote, Bao Tran Truong, Wanying Zhao, David Axelrod, Aria Pessianzadeh, Zoher Kachwala, Munjung Kim, Ozgur Can Seckin, Minsuk Kim, Sunny Gandhi, Amrutha Manikonda, Francesco Pierri, Filippo Menczer, Kai-Cheng Yang Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22205 Fri, 02 Jun 2023 00:00:00 -0700 Wiki-Based Communities of Interest: Demographics and Outliers https://ojs.aaai.org/index.php/ICWSM/article/view/22206 In this paper, we release data about demographic information and outliers of communities of interest. Identified from Wiki-based sources, mainly Wikidata, the data covers 7.5k communities, e.g., members of the White House Coronavirus Task Force, and 345k subjects, e.g., Deborah Birx. We describe the statistical inference methodology adopted to mine such data. We release subject-centric and group-centric datasets in JSON format, as well as a browsing interface. Finally, we forsee three areas where this dataset can be useful: in social sciences research, it provides a resource for demographic analyses; in web-scale collaborative encyclopedias, it serves as an edit recommender to fill knowledge gaps; and in web search, it offers lists of salient statements about queried subjects for higher user engagement. The dataset can be accessed at: https://doi.org/10.5281/zenodo.7410436 Hiba Arnaout, Simon Razniewski, Jeff Z. Pan Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22206 Fri, 02 Jun 2023 00:00:00 -0700 #RoeOverturned: Twitter Dataset on the Abortion Rights Controversy https://ojs.aaai.org/index.php/ICWSM/article/view/22207 On June 24, 2022, the United States Supreme Court overturned landmark rulings made in its 1973 verdict in Roe v. Wade. The justices by way of a majority vote in Dobbs v. Jackson Women's Health Organization, decided that abortion wasn't a constitutional right and returned the issue of abortion to the elected representatives. This decision triggered multiple protests and debates across the US, especially in the context of the midterm elections in November 2022. Given that many citizens use social media platforms to express their views and mobilize for collective action, and given that online debate provides tangible effects on public opinion, political participation, news media coverage, and the political decision-making, it is crucial to understand online discussions surrounding this topic. Toward this end, we present the first large-scale Twitter dataset collected on the abortion rights debate in the United States. We present a set of 74M tweets systematically collected over the course of one year from January 1, 2022 to January 6, 2023. Rong-Ching Chang, Ashwin Rao, Qiankun Zhong, Magdalena Wojcieszak, Kristina Lerman Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22207 Fri, 02 Jun 2023 00:00:00 -0700 Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War between Ukraine and Russia https://ojs.aaai.org/index.php/ICWSM/article/view/22208 On February 24, 2022, Russia invaded Ukraine. In the days that followed, reports kept flooding in from laymen to news anchors of a conflict quickly escalating into war. Russia faced immediate backlash and condemnation from the world at large. While the war continues to contribute to an ongoing humanitarian and refugee crisis in Ukraine, a second battlefield has emerged in the online space, both in the use of social media to garner support for both sides of the conflict and also in the context of information warfare. In this paper, we present a collection of nearly half a billion tweets, from February 22, 2022, through January 8, 2023, that we are publishing for the wider research community to use. This dataset can be found at https://github.com/echen102/ukraine-russia. Our preliminary analysis on a subset of our dataset already shows evidence of public engagement with Russian state-sponsored media and other domains that are known to push unreliable information towards the beginning of the war; the former saw a spike in activity on the day of the Russian invasion, while the other saw spikes in engagement within the first month of the war. Our hope is that this public dataset can help the research community to further understand the ever-evolving role that social media plays in information dissemination, influence campaigns, grassroots mobilization, and much more, during a time of conflict. Emily Chen, Emilio Ferrara Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22208 Fri, 02 Jun 2023 00:00:00 -0700 HateMM: A Multi-Modal Dataset for Hate Video Classification https://ojs.aaai.org/index.php/ICWSM/article/view/22209 Hate speech has become one of the most significant issues in modern society, having implications in both the online and the offline world. Due to this, hate speech research has recently gained a lot of traction. However, most of the work has primarily focused on text media with relatively little work on images and even lesser on videos. Thus, early stage automated video moderation techniques are needed to handle the videos that are being uploaded to keep the platform safe and healthy. With a view to detect and remove hateful content from the video sharing platforms, our work focuses on hate video detection using multi-modalities. To this end, we curate ~43 hours of videos from BitChute and manually annotate them as hate or non-hate, along with the frame spans which could explain the labelling decision. To collect the relevant videos we harnessed search keywords from hate lexicons. We observe various cues in images and audio of hateful videos. Further, we build deep learning multi-modal models to classify the hate videos and observe that using all the modalities of the videos improves the overall hate speech detection performance (accuracy=0.798, macro F1-score=0.790) by ~5.7% compared to the best uni-modal model in terms of macro F1 score. In summary, our work takes the first step toward understanding and modeling hateful videos on video hosting platforms such as BitChute. Mithun Das, Rohit Raj, Punyajoy Saha, Binny Mathew, Manish Gupta, Animesh Mukherjee Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22209 Fri, 02 Jun 2023 00:00:00 -0700 HealthE: Recognizing Health Advice & Entities in Online Health Communities https://ojs.aaai.org/index.php/ICWSM/article/view/22210 The task of extracting and classifying entities is at the core of important Health-NLP systems such as misinformation detection, medical dialogue modeling, and patient-centric information tools. Granular knowledge of textual entities allows these systems to utilize knowledge bases, retrieve relevant information, and build graphical representations of texts. Unfortunately, most existing works on health entity recognition are trained on clinical notes, which are both lexically and semantically different from public health information found in online health resources or social media. In other words, existing health entity recognizers vastly under-represent the entities relevant to public health data, such as those provided by sites like WebMD. It is crucial that future Health-NLP systems be able to model such information, as people rely on online health advice for personal health management and clinically relevant decision making. In this work, we release a new annotated dataset, HealthE, which facilitates the large-scale analysis of online textual health advice. HealthE consists of 3,400 health advice statements with token-level entity annotations. Additionally, we release 2,256 health statements which are not health advice to facilitate health advice mining. HealthE is the first dataset with an entity-recognition label space designed for the modeling of online health advice. We motivate the need for HealthE by demonstrating the limitations of five widely-used health entity recognizers on HealthE, such as those offered by Google and Amazon. We additionally benchmark three pre-trained language models on our dataset as reference for future research. All data is made publicly available. Joseph Gatto, Parker Seegmiller, Garrett M Johnston, Madhusudan Basak, Sarah Masud Preum Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22210 Fri, 02 Jun 2023 00:00:00 -0700 Truth Social Dataset https://ojs.aaai.org/index.php/ICWSM/article/view/22211 Formally announced to the public following former President Donald Trump’s bans and suspensions from mainstream social networks in early 2022 following his role in the January 6 Capitol Riots, Truth Social was launched as an ``alternative'' social media platform that claims to be a refuge for free speech, offering a platform for those disaffected by the content moderation policies of then existing, mainstream social networks. The subsequent rise of Truth Social has been driven largely by hard-line supporters of the former president as well as those affected by the content moderation of other social networks. These distinct qualities combined with the its status as the main mouthpiece of the former president positions Truth Social as a particularly influential social media platform and give rise to several research questions. However, outside of a handful of news reports, little is known about the new social media platform partially due to a lack of well-curated data. In the current work, we describe a dataset of over 823,000 posts to Truth Social and and social network with over 454,000 distinct users. In addition to the dataset itself, we also present some basic analysis of its content, certain temporal features, and its network. Patrick Gerard, Nicholas Botzer, Tim Weninger Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22211 Fri, 02 Jun 2023 00:00:00 -0700 Construction of Evaluation Datasets for Trend Forecasting Studies https://ojs.aaai.org/index.php/ICWSM/article/view/22212 In this study, we discuss issues in the traditional evaluation norms of trend forecasts, outline a suitable evaluation method, propose an evaluation dataset construction procedure, and publish Trend Dataset: the dataset we have created. As trend predictions often yield economic benefits, trend forecasting studies have been widely conducted. However, a consistent and systematic evaluation protocol has yet to be adopted. We consider that the desired evaluation method would address the performance of predicting which entity will trend, when a trend occurs, and how much it will trend based on a reliable indicator of the general public's recognition as a gold standard. Accordingly, we propose a dataset construction method that includes annotations for trending status (trending or non-trending), degree of trending (how well it is recognized), and the trend period corresponding to a surge in recognition rate. The proposed method uses questionnaire-based recognition rates interpolated using Internet search volume, enabling trend period annotation on a weekly timescale. The main novelty is that we survey when the respondents recognize the entities that are highly likely to have trended and those that haven't. This procedure enables a balanced collection of both trending and non-trending entities. We constructed the dataset and verified its quality. We confirmed that the interests of entities estimated using Wikipedia information enables the efficient collection of trending entities a priori. We also confirmed that the Internet search volume agrees with public recognition rate among trending entities. Shogo Matsuno, Sakae Mizuki, Takeshi Sakaki Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22212 Fri, 02 Jun 2023 00:00:00 -0700 VaxxHesitancy: A Dataset for Studying Hesitancy towards COVID-19 Vaccination on Twitter https://ojs.aaai.org/index.php/ICWSM/article/view/22213 Vaccine hesitancy has been a common concern, probably since vaccines were created and, with the popularisation of social media, people started to express their concerns about vaccines online alongside those posting pro- and anti-vaccine content. Predictably, since the first mentions of a COVID-19 vaccine, social media users posted about their fears and concerns or about their support and belief into the effectiveness of these rapidly developing vaccines. Identifying and understanding the reasons behind public hesitancy towards COVID-19 vaccines is important for policy markers that need to develop actions to better inform the population with the aim of increasing vaccine take-up. In the case of COVID-19, where the fast development of the vaccines was mirrored closely by growth in anti-vaxx disinformation, automatic means of detecting citizen attitudes towards vaccination became necessary. This is an important computational social sciences task that requires data analysis in order to gain in-depth understanding of the phenomena at hand. Annotated data is also necessary for training data-driven models for more nuanced analysis of attitudes towards vaccination. To this end, we created a new collection of over 3,101 tweets annotated with users' attitudes towards COVID-19 vaccination (stance). Besides, we also develop a domain-specific language model (VaxxBERT) that achieves the best predictive performance (73.0 accuracy and 69.3 F1-score) as compared to a robust set of baselines. To the best of our knowledge, these are the first dataset and model that model vaccine hesitancy as a category distinct from pro- and anti-vaccine stance. Yida Mu, Mali Jin, Charlie Grimshaw, Carolina Scarton, Kalina Bontcheva, Xingyi Song Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22213 Fri, 02 Jun 2023 00:00:00 -0700 Capturing the Aftermath of the Dobbs v. Jackson Women’s Health Organization Decision in Google Search Results across the U.S. https://ojs.aaai.org/index.php/ICWSM/article/view/22214 How do Google Search results change following an impactful real-world event, such as the U.S. Supreme Court decision on June 24, 2022 to overturn Roe v. Wade? And what do they tell us about the nature of event-driven content, generated by various participants in the online information environment? In this paper, we present a dataset of more than 1.74 million Google Search results pages collected between June 24 and July 17, 2022, intended to capture what Google Search surfaced in response to queries about this event of national importance. These search pages were collected for 65 locations in 13 U.S. states, a mix of red, blue, and purple states, with respect to their voting patterns. We describe the process of building a set of circa 1,700 phrases used for searching Google, how we gathered the search results for each location, and how these results were parsed to extract information about the most frequently encountered web domains. We believe that this dataset, which comprises raw data (search results as HTML files) and processed data (extracted links organized as CSV files) can be used to answer research questions that are of interest to computational social scientists as well as communication and media studies scholars. Brooke Perreault, Lan Dau, Anya Wintner, Eni Mustafaraj Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22214 Fri, 02 Jun 2023 00:00:00 -0700 Just Another Day on Twitter: A Complete 24 Hours of Twitter Data https://ojs.aaai.org/index.php/ICWSM/article/view/22215 At the end of October 2022, Elon Musk concluded his acquisition of Twitter. In the weeks and months before that, several questions were publicly discussed that were not only of interest to the platform's future buyers, but also of high relevance to the Computational Social Science research community. For example, how many active users does the platform have? What percentage of accounts on the site are bots? And, what are the dominating topics and sub-topical spheres on the platform? In a globally coordinated effort of 80 scholars to shed light on these questions, and to offer a dataset that will equip other researchers to do the same, we have collected all 375 million tweets published within a 24-hour time period starting on September 21, 2022. To the best of our knowledge, this is the first complete 24-hour Twitter dataset that is available for the research community. With it, the present work aims to accomplish two goals. First, we seek to answer the aforementioned questions and provide descriptive metrics about Twitter that can serve as references for other researchers. Second, we create a baseline dataset for future research that can be used to study the potential impact of the platform's ownership change. Jürgen Pfeffer, Daniel Matter, Kokil Jaidka, Onur Varol, Afra Mashhadi, Jana Lasser, Dennis Assenmacher, Siqi Wu, Diyi Yang, Cornelia Brantner, Daniel M. Romero, Jahna Otterbacher, Carsten Schwemmer, Kenneth Joseph, David Garcia, Fred Morstatter Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22215 Fri, 02 Jun 2023 00:00:00 -0700 Codes, Patterns and Shapes of Contemporary Online Antisemitism and Conspiracy Narratives – an Annotation Guide and Labeled German-Language Dataset in the Context of COVID-19 https://ojs.aaai.org/index.php/ICWSM/article/view/22216 Over the course of the COVID-19 pandemic, existing conspiracy theories were refreshed and new ones were created, often interwoven with antisemitic narratives, stereotypes and codes. The sheer volume of antisemitic and conspiracy theory content on the Internet makes data-driven algorithmic approaches essential for anti-discrimination organizations and researchers alike. However, the manifestation and dissemination of these two interrelated phenomena is still quite under-researched in scholarly empirical research of large text corpora. Algorithmic approaches for the detection and classification of specific contents usually require labeled datasets, annotated based on conceptually sound guidelines. While there is a growing number of datasets for the more general phenomenon of hate speech, the development of corpora and annotation guidelines for antisemitic and conspiracy content is still in its infancy, especially for languages other than English. To address this gap, we have developed an annotation guide for antisemitic and conspiracy theory online content in the context of the COVID-19 pandemic that includes working definitions, e.g. of specific forms of antisemitism such as encoded and post-Holocaust antisemitism. We use the guide to annotate a German-language dataset consisting of $\sim \! 3,700$ Telegram messages sent between 03/2020 and 12/2021. Elisabeth Steffen, Helena Mihaljevic, Milena Pustet, Nyco Bischoff, Maria do Mar Castro Varela, Yener Bayramoglu, Bahar Oghalai Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22216 Fri, 02 Jun 2023 00:00:00 -0700 Invasion@Ukraine: Providing and Describing a Twitter Streaming Dataset That Captures the Outbreak of War between Russia and Ukraine in 2022 https://ojs.aaai.org/index.php/ICWSM/article/view/22217 Social media can be a mirror of human interaction, society, and historic disruptions. Their reach enables the global dissemination of information in the shortest possible time and, thus, the individual participation of people worldwide in global events in almost real-time. However, these platforms can be equally efficiently used in information warfare to manipulate human perception and opinion formation. Within this paper, we describe a dataset of raw tweets collected via the Twitter Streaming API in the context of the onset of the war, which Russia started in Ukraine on February 24, 2022. A distinctive feature of the dataset is that it covers the period from one week before to one week after Russia invasion of Ukraine. This paper details the acquisition process and provides first insights into the content of the data stream. In addition, the data has been annotated with availability tags, resulting from rehydration attempts at two points in time: directly after data acquisition and shortly before manuscript submission. This may provide information on Twitter moderation policies. Further, we provide a detailed list of other published dataset covering the same topic. On the content level, we can show that our dataset comprises several distinct topics related to the conflict and conspiracy narratives -- topics that deserve more profound investigation. Therefore, the presented dataset is also made available to the community in an extended version with pseudonymized tweet content upon request. Janina Susanne Pohl, Simon Markmann, Dennis Assenmacher, Christian Grimme Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22217 Fri, 02 Jun 2023 00:00:00 -0700 YouNICon: YouTube’s CommuNIty of Conspiracy Videos https://ojs.aaai.org/index.php/ICWSM/article/view/22218 Conspiracy theories are widely propagated on social media. Among various social media services, YouTube is one of the most influential sources of news and entertainment. This paper seeks to develop a dataset, YOUNICON, to enable researchers to perform conspiracy theory detection as well as classification of videos with conspiracy theories into different topics. YOUNICON is a dataset with a large collection of videos from suspicious channels that were identified to contain conspiracy theories in a previous study. Overall, YOUNICON will enable researchers to study trends in conspiracy theories and understand how individuals can interact with the conspiracy theory producing community or channel. Our data is available at: https://doi.org/10.5281/zenodo.7466262. Shao Yi Liaw, Fan Huang, Fabricio Benevenuto, Haewoon Kwak, Jisun An Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22218 Fri, 02 Jun 2023 00:00:00 -0700 A Dataset of Coordinated Cryptocurrency-Related Social Media Campaigns https://ojs.aaai.org/index.php/ICWSM/article/view/22219 The rise in adoption of cryptoassets has brought many new and inexperienced investors in the cryptocurrency space. These investors can be disproportionally influenced by information they receive online, and particularly from social media. This paper presents a dataset of crypto-related bounty events and the users that participate in them. These events coordinate social media campaigns to create artificial "hype" around a crypto project in order to influence the price of its token. The dataset consists of information about 15.8K cross-media bounty events, 185K participants, 10M forum comments and 82M social media URLs collected from the Bounties(Altcoins) subforum of the BitcoinTalk online forum from May 2014 to December 2022. We describe the data collection and the data processing methods employed and we present a basic characterization of the dataset. Furthermore, we discuss potential research opportunities afforded by the dataset across many disciplines and we highlight potential novel insights into how the cryptocurrency industry operates and how it interacts with its audience. Karolis Zilius, Tasos Spiliotopoulos, Aad van Moorsel Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22219 Fri, 02 Jun 2023 00:00:00 -0700 Divergences in Following Patterns between Influential Twitter Users and Their Audiences across Dimensions of Identity https://ojs.aaai.org/index.php/ICWSM/article/view/22220 Identity spans multiple dimensions; however, the relative salience of a dimension of identity can vary markedly from person to person. Furthermore, there is often a difference between one’s internal identity (how salient different aspects of one's identity are to oneself) and external identity (how salient different aspects are to the external world). We attempt to capture the internal and external saliences of different dimensions of identity for influential users (“influencers”) on Twitter using the follow graph. We consider an influencer’s “ego-centric” profile, which is determined by their personal following patterns and is largely in their direct control, and their “audience-centric” profile, which is determined by the following patterns of their audience and is outside of their direct control. Using these following patterns we calculate a corresponding salience metric that quantifies how important a certain dimension of identity is to an individual. We find that relative to their audiences, influencers exhibit more salience in race in their ego-centric profiles and less in religion and politics. One practical application of these findings is to identify "bridging" influencers that can connect their sizeable audiences to people from traditionally underheard communities. This could potentially increase the diversity of views audiences are exposed to through a trusted conduit (i.e. an influencer they already follow) and may lead to a greater voice for influencers from communities of color or women. Suyash Fulay, Nabeel Gillani, Deb Roy Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22220 Fri, 02 Jun 2023 00:00:00 -0700 Firearms on Twitter: A Novel Object Detection Pipeline https://ojs.aaai.org/index.php/ICWSM/article/view/22221 Social media is an important source of real-time imagery concerning world events. One subset of social media posts which may be of particular interest are those featuring firearms. These posts can give insight into weapon movements, troop activity and civilian safety. Object detection tools offer important opportunities for insight into these images. Unfortunately, these images can be visually complex, poorly lit and generally challenging for object detection models. We present an analysis of existing gun detection datasets, and find that these datasets to not effectively address the challenge of gun detection on real-life images. Following this, we present a novel object detection pipeline. We train our pipeline on a number of datasets including one created for this investigation made up of Twitter images of the Russo-Ukrainian War. We compare the performance of our model as trained on the different datasets to baseline numbers provided by original authors as well as a YOLO v5 benchmark. We find that our model outperforms the state-of-the-art benchmarks on contextually rich, real-life-derived imagery of firearms. Ryan Harvey, Rémi Lebret, Stéphane Massonnet, Karl Aberer, Gianluca Demartini Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22221 Fri, 02 Jun 2023 00:00:00 -0700 Auditing Elon Musk’s Impact on Hate Speech and Bots https://ojs.aaai.org/index.php/ICWSM/article/view/22222 On October 27th, 2022, Elon Musk purchased Twitter, becoming its new CEO and firing many top executives in the process. Musk listed fewer restrictions on content moderation and removal of spam bots among his goals for the platform. Given findings of prior research on moderation and hate speech in online communities, the promise of less strict content moderation poses the concern that hate will rise on Twitter. We examine the levels of hate speech and prevalence of bots before and after Musk's acquisition of the platform. We find that hate speech rose dramatically upon Musk purchasing Twitter and the prevalence of most types of bots increased, while the prevalence of astroturf bots decreased. Daniel Hickey, Matheus Schmitz, Daniel Fessler, Paul E. Smaldino, Goran Muric, Keith Burghardt Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22222 Fri, 02 Jun 2023 00:00:00 -0700 The Amplification Paradox in Recommender Systems https://ojs.aaai.org/index.php/ICWSM/article/view/22223 Automated audits of recommender systems found that blindly following recommendations leads users to increasingly partisan, conspiratorial, or false content. At the same time, studies using real user traces suggest that recommender systems are not the primary driver of attention toward extreme content; on the contrary, such content is mostly reached through other means, e.g., other websites. In this paper, we explain the following apparent paradox: if the recommendation algorithm favors extreme content, why is it not driving its consumption? With a simple agent-based model where users attribute different utilities to items in the recommender system, we show through simulations that the collaborative-filtering nature of recommender systems and the nicheness of extreme content can resolve the apparent paradox: although blindly following recommendations would indeed lead users to niche content, users rarely consume niche content when given the option because it is of low utility to them, which can lead the recommender system to deamplify such content. Our results call for a nuanced interpretation of "algorithmic amplification" and highlight the importance of modeling the utility of content to users when auditing recommender systems. Code available: https://github.com/epfl-dlab/amplification_paradox. Manoel Horta Ribeiro, Veniamin Veselovsky, Robert West Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22223 Fri, 02 Jun 2023 00:00:00 -0700 Host-Centric Social Connectedness of Migrants in Europe on Facebook https://ojs.aaai.org/index.php/ICWSM/article/view/22224 Extant literature has explored the social integration process of migrants settling in host communities. However, this literature typically takes a migrant-centric view, implicitly putting the burden of a successful integration on the migrant, and trying to identify the factors that lead to integration along various dimensions. In this paper, we flip this point of view by studying the attributes of natives that govern their propensity to form social ties with migrants.We do so by using anonymous and aggregate social network data provided by Facebook’s advertising platform. More specifically, we look at factors that influence the propensity for a likely-to-be non-Muslim Facebook user to have at least one social connection to a Facebook user who celebrates Ramadan. Given that, in the European context, following Islam is predominantly tied to a migration background, this gives us a lens into cross-cultural native-migrant connectivity. Our study considers demographic attributes of the host population, such as age, gender, and education level, as well as spatial variation across 30 European cities. Our findings suggest that young, educated, and male Facebook users are relatively more likely to build cross-cultural ties, compared to older, less educated, and female Facebook users. We also observe heterogeneity across the analyzed cities. Aparup Khatua, Emilio Zagheni, Ingmar Weber Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22224 Fri, 02 Jun 2023 00:00:00 -0700 Characterizing Coin-Based Voting Governance in DPoS Blockchains https://ojs.aaai.org/index.php/ICWSM/article/view/22225 Delegated-Proof-of-Stake (DPoS) blockchains are governed by a committee of dozens of members elected via coin-based voting mechanisms. This paper presents a large-scale empirical study of two critical characteristics, personal impact and participation rate, of three leading DPoS blockchains. Our findings reveal the existence of decisive voters whose votes can alter election outcomes, as well as the fact that almost half of the coins have never been used in committee elections. Our research contributes to demystifying the actual use of coin-based voting governance and offers novel insights into the potential security risks of DPoS blockchains. Chao Li, Runhua Xu, Li Duan Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22225 Fri, 02 Jun 2023 00:00:00 -0700 Different Affordances on Facebook and SMS Text Messaging Do Not Impede Generalization of Language-Based Predictive Models https://ojs.aaai.org/index.php/ICWSM/article/view/22226 Adaptive mobile device-based health interventions often use machine learning models trained on non-mobile device data, such as social media text, due to the difficulty and high expense of collecting large text message (SMS) data. Therefore, understanding the differences and generalization of models between these platforms is crucial for proper deployment. We examined the psycho-linguistic differences between Facebook and text messages, and their impact on out-of-domain model performance, using a sample of 120 users who shared both. We found that users use Facebook for sharing experiences (e.g., leisure) and SMS for task-oriented and conversational purposes (e.g., plan confirmations), reflecting the differences in the affordances. To examine the downstream effects of these differences, we used pre-trained Facebook-based language models to estimate age, gender, depression, life satisfaction, and stress on both Facebook and SMS. We found no significant differences in correlations between the estimates and self-reports across 6 of 8 models. These results suggest using pre-trained Facebook language models to achieve better accuracy with just-in-time interventions. Tingting Liu, Salvatore Giorgi, Xiangyu Tao, Sharath Chandra Guntuku, Douglas Bellew, Brenda Curtis, Lyle Ungar Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22226 Fri, 02 Jun 2023 00:00:00 -0700 An Example of (Too Much) Hyper-Parameter Tuning In Suicide Ideation Detection https://ojs.aaai.org/index.php/ICWSM/article/view/22227 This work starts with the TWISCO baseline, a benchmark of suicide-related content from Twitter. We find that hyper-parameter tuning can improve this baseline by 9%. We examined 576 combinations of hyper-parameters: learning rate, batch size, epochs and date range of training data. Reasonable settings of learning rate and batch size produce better results than poor settings. Date range is less conclusive. Balancing the date range of the training data to match the benchmark ought to improve performance, but the differences are relatively small. Optimal settings of learning rate and batch size are much better than poor settings, but optimal settings of date range are not that different from poor settings of date range. Finally, we end with concerns about reproducibility. Of the 576 experiments, 10% produced F1 performance above baseline. It is common practice in the literature to run many experiments and report the best, but doing so may be risky, especially given the sensitive nature of Suicide Ideation Detection. Annika Marie Schoene, John Ortega, Silvio Amir, Kenneth Church Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22227 Fri, 02 Jun 2023 00:00:00 -0700 The Half-Life of a Tweet https://ojs.aaai.org/index.php/ICWSM/article/view/22228 Twitter has started to share an impression count variable as part of the available public metrics for every Tweet collected with Twitter’s APIs. With the information about how often a particular Tweet has been shown to Twitter users at the time of data collection, we can learn important insights about the dissemination process of a Tweet by measuring its impression count repeatedly over time. With our preliminary analysis, we can show that on average the peak of impressions per second is 72 seconds after a Tweet was sent and that after 24 hours, no relevant number of impressions can be observed for ∼95% of all Tweets. Finally, we estimate that the median half-life of a Tweet, i.e. the time it takes before half of all impressions are created, is about 80 minutes. Jürgen Pfeffer, Daniel Matter, Anahit Sargsyan Copyright (c) 2023 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICWSM/article/view/22228 Fri, 02 Jun 2023 00:00:00 -0700