Sampling Representative Users from Large Social Networks

Jie Tang; Chenhui Zhang; Keke Cai; Li Zhang; Zhong Su

doi:10.1609/aaai.v29i1.9202

Authors

Jie Tang Tsinghua University
Chenhui Zhang Tsinghua University
Keke Cai IBM CRL
Li Zhang IBM CRL
Zhong Su IBM CRL

DOI:

https://doi.org/10.1609/aaai.v29i1.9202

Keywords:

network sampling, social networks

Abstract

Finding a subset of users to statistically represent the original social network is a fundamental issue in Social Network Analysis (SNA). The problem has not been extensively studied in existing literature. In this paper, we present a formal definition of the problem of \textbf{sampling representative users} from social network. We propose two sampling models and theoretically prove their NP-hardness. To efficiently solve the two models, we present an efficient algorithm with provable approximation guarantees. Experimental results on two datasets show that the proposed models for sampling representative users significantly outperform (+6\%-23\% in terms of Precision@100) several alternative methods using authority or structure information only. The proposed algorithms are also effective in terms of time complexity. Only a few seconds are needed to sampling ~300 representative users from a network of 100,000 users.All data and codes are publicly available.

Sampling Representative Users from Large Social Networks

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information