FusedRec: Fused Embedding Communication for Distributed Recommendation Training on GPUs

Authors

  • Xuanteng Huang Sun Yat-sen University
  • Fan Li Tencent
  • Riyang Hu Tencent
  • Jianchang Zhang Tencent
  • Yuan Peng Tencent
  • Yang Zhou Tencent
  • Fangying Chen Tencent
  • Xianwei Zhang Sun Yat-sen University

DOI:

https://doi.org/10.1609/aaai.v40i17.38512

Abstract

Recent years have witnessed the wide adoption of deep learning recommendation models (DLRMs) for many online services. Unlike traditional DNN training, DLRMs leverage massive embeddings to represent sparse features, which are stored in distributed GPUs following the model parallel paradigm. Existing approaches adopt deduplication to eliminate replicated embeddings involved in AltoAll transfers to avoid unnecessary communication. In our practices, we have observed that such a deduplication design exacerbates interconnect inefficiency due to the fragmented embedding transfers with reduced message sizes, hindering the performance of distributed DLRM training. This paper introduces FusedRec, a fused embedding communication and lookup mechanism to tackle the inefficiency due to deduplication. By seeking the opportunities to fuse embeddings from multiple categories into a group, FusedRec conducts the communication in a combined shot to alleviate bandwidth under-utilization. Meanwhile, a categorical-aware hashing algorithm is integrated into FusedRec to retain the category information during lookup without extra communication. Combining with efficient unique and recovery operations, comprehensive results show FusedRec achieves a 37.8% throughput speedup in average compared to the SOTA industry implementation, without hurting the recommendation qualities of our in-house models used in online production environments.

Downloads

Published

2026-03-14

How to Cite

Huang, X., Li, F., Hu, R., Zhang, J., Peng, Y., Zhou, Y., … Zhang, X. (2026). FusedRec: Fused Embedding Communication for Distributed Recommendation Training on GPUs. Proceedings of the AAAI Conference on Artificial Intelligence, 40(17), 14910–14918. https://doi.org/10.1609/aaai.v40i17.38512

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management I