Should We Be Confident in Peer Effects Estimated From Social Network Crawls?
Research in social network analysis and statistical relational learning has produced a number of methods for learning relational models from large-scale network data. Unfortunately, these methods have been developed under the unrealistic assumption of full data access. In practice, however, the data are often collected by crawling the network, due to proprietary access, limited resources, and privacy concerns. While prior studies have examined the impact of network crawling on the structural characteristics of the resulting samples, this work presents the first empirical study designed to assess the impact of widely used network crawlers on the estimation of peer effects. Our experiments demonstrate that the estimates obtained from network samples collected by existing crawlers can be quite inaccurate, unless a significant portion of the network is crawled. Meanwhile, motivated by recent advances in partial network crawling, we develop crawl-aware relational methods that provide accurate estimates of peer effects with statistical guarantees from partial crawls.