FoPro: Few-Shot Guided Robust Webly-Supervised Prototypical Learning


  • Yulei Qin Tencent YouTu Lab
  • Xingyu Chen Tencent YouTu Lab
  • Chao Chen Tencent YouTu Lab
  • Yunhang Shen Tencent YouTu Lab
  • Bo Ren Tencent YouTu Lab
  • Yun Gu Shanghai Jiao Tong University
  • Jie Yang Shanghai Jiao Tong University
  • Chunhua Shen Zhejiang University



CV: Representation Learning for Vision


Recently, webly supervised learning (WSL) has been studied to leverage numerous and accessible data from the Internet. Most existing methods focus on learning noise-robust models from web images while neglecting the performance drop caused by the differences between web domain and real-world domain. However, only by tackling the performance gap above can we fully exploit the practical value of web datasets. To this end, we propose a Few-shot guided Prototypical (FoPro) representation learning method, which only needs a few labeled examples from reality and can significantly improve the performance in the real-world domain. Specifically, we initialize each class center with few-shot real-world data as the ``realistic" prototype. Then, the intra-class distance between web instances and ``realistic" prototypes is narrowed by contrastive learning. Finally, we measure image-prototype distance with a learnable metric. Prototypes are polished by adjacent high-quality web images and involved in removing distant out-of-distribution samples. In experiments, FoPro is trained on web datasets with a few real-world examples guided and evaluated on real-world datasets. Our method achieves the state-of-the-art performance on three fine-grained datasets and two large-scale datasets. Compared with existing WSL methods under the same few-shot settings, FoPro still excels in real-world generalization. Code is available at




How to Cite

Qin, Y., Chen, X., Chen, C., Shen, Y., Ren, B., Gu, Y., Yang, J., & Shen, C. (2023). FoPro: Few-Shot Guided Robust Webly-Supervised Prototypical Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2), 2101-2109.



AAAI Technical Track on Computer Vision II