CoLink: An Unsupervised Framework for User Identity Linkage

Authors

  • Zexuan Zhong University of Illinois at Urbana-Champaign
  • Yong Cao Microsoft Research
  • Yong Cao Microsoft Research
  • Mu Guo Microsoft Research
  • Mu Guo Microsoft Research
  • Zaiqing Nie Alibaba AI Labs
  • Zaiqing Nie Alibaba AI Labs

DOI:

https://doi.org/10.1609/aaai.v32i1.12014

Keywords:

User Identity Linkage, Sequence-to-sequence learning

Abstract

Nowadays, it is very common for one person to be in different social networks. Linking identical users across different social networks, also known as the User Identity Linkage (UIL) problem, is fundamental for many applications. There are two major challenges in the UIL problem. First, it's extremely expensive to collect manually linked user pairs as training data. Second, the user attributes in different networks are usually defined and formatted very differently which makes attribute alignment very hard. In this paper we propose CoLink, a general unsupervised framework for the UIL problem. CoLink employs a co-training algorithm, which manipulates two independent models, the attribute-based model and the relationship-based model, and makes them reinforce each other iteratively in an unsupervised way. We also propose the sequence-to-sequence learning as a very effective implementation of the attribute-based model, which can well handle the challenge of the attribute alignment by treating it as a machine translation problem. We apply CoLink to a UIL task of mapping the employees in an enterprise network to their LinkedIn profiles. The experiment results show that CoLink generally outperforms the state-of-the-art unsupervised approaches by an F1 increase over 20%.

Downloads

Published

2018-04-27

How to Cite

Zhong, Z., Cao, Y., Cao, Y., Guo, M., Guo, M., Nie, Z., & Nie, Z. (2018). CoLink: An Unsupervised Framework for User Identity Linkage. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.12014