Large-Scale Community Detection on YouTube for Topic Discovery and Exploration


  • Ullas Gargi Google, Inc.
  • Wenjun Lu University of Maryland
  • Vahab Mirrokni Google, Inc.
  • Sangho Yoon Google, Inc.


Detecting coherent, well-connected communities in large graphs provides insight into the graph structure and can serve as the basis for content discovery. Clustering is a popular technique for community detection but global algorithms that examine the entire graph do not scale. Local algorithms are highly parallelizable but perform sub-optimally, especially in applications where we need to optimize multiple metrics. We present a multi-stage algorithm based on local-clustering that is highly scalable, combining a pre-processing stage, a lo- cal clustering stage, and a post-processing stage. We apply it to the YouTube video graph to generate named clusters of videos with coherent content. We formalize coverage, co- herence, and connectivity metrics and evaluate the quality of the algorithm for large YouTube graphs. Our use of local algorithms for global clustering, and its implementation and practical evaluation on such a large scale is a first of its kind.




How to Cite

Gargi, U., Lu, W., Mirrokni, V., & Yoon, S. (2021). Large-Scale Community Detection on YouTube for Topic Discovery and Exploration. Proceedings of the International AAAI Conference on Web and Social Media, 5(1), 486-489. Retrieved from