Generating Realistic Interest-Driven Information Cascades
We propose a model for the synthetic generation of information cascades in social media. In our model the information “memes” propagating in the social network are characterized by a probability distribution in a topic space, accompanied by a textual description, i.e., a bag of keywords coherent with the topic distribution. Similarly, every user of the social media is described by a vector of interests defined over the same topic space. Information cascades are governed by the topic of the meme, its level of virality, the interests of each user, community pressure, and social influence.
The main technical challenge we face towards our goal is the generation of realistic interest vectors, given a known network structure and a tunable level of homophily. We tackle this problem by means of a method based on non-negative matrix factorization, which is shown experimentally to outperform non-trivial baselines based on label propagation and random-walk-based graph embedding.
As we showcase in our experiments, our model offers a small set of simple and easily interpretable “knobs” which allow to study, in vitro, how each set of assumptions affects the resulting propagations. Finally, we show how to generate synthetic cascades that have similar macro-statistics to the real-world cascades for a dataset containing both the network and the cascades.