Latent Set Models for Two-Mode Network Data

Christopher DuBois; James Foulds; Padhraic Smyth

doi:10.1609/icwsm.v5i1.14131

Authors

Christopher DuBois University of California, Irvine
James Foulds University of California, Irvine
Padhraic Smyth University of California, Irvine

DOI:

https://doi.org/10.1609/icwsm.v5i1.14131

Abstract

Two-mode networks are a natural representation for many kinds of relational data. These networks are bipartite graphs consisting of two distinct sets ("modes") of entities. For example, one can model multiple recipient email data as a two-mode network of (a) individuals and (b) the emails that they send or receive. In this work we present a statistical model for two-mode network data which posits that individuals belong to latent sets and that the members of a particular set tend to co-appear. We show how to infer these latent sets from observed data using a Markov chain Monte Carlo inference algorithm. We apply the model to the Enron email corpus, using it to discover interpretable latent structure as well as evaluating its predictive accuracy on a missing data task. Extensions to the model are discussed that incorporate additional side information such as the email's sender or text content, further improving the accuracy of the model.

Latent Set Models for Two-Mode Network Data

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information