Advancing Federated Learning by Addressing Data and System Heterogeneity


  • Yiran Chen Duke University



Distributed Machine Learning & Federated Learning, Privacy-Aware ML, Deep Learning Theory


In the emerging field of federated learning (FL), the challenge of heterogeneity, both in data and systems, presents significant obstacles to efficient and effective model training. This talk focuses on the latest advancements and solutions addressing these challenges. The first part of the talk delves into data heterogeneity, a core issue in FL, where data distributions across different clients vary widely and affect FL convergence. We will introduce the FedCor framework addressing this by modeling loss correlations between clients using Gaussian Process and reducing expected global loss. External covariate shift in FL is uncovered, demonstrating that normalization layers are crucial, and layer normalization proves effective. Additionally, class imbalance in FL degrades performance, but our proposed Federated Class-balanced Sampling (Fed-CBS) mechanism reduces this imbalance by employing homomorphic encryption for privacy preservation. The second part of the talk shifts focus to system heterogeneity, an equally critical challenge in FL. System heterogeneity involves the varying computational capabilities, network speeds, and other resource-related constraints of participating devices in FL. To address this, we introduce FedSEA, which is a semi-asynchronous FL framework that addresses accuracy drops by balancing aggregation frequency and predicting local update arrival. Additionally, we discuss FedRepre, a framework specifically designed to enhance FL in real-world environments by addressing challenges including unbalanced local dataset distributions, uneven computational capabilities, and fluctuating network speeds. By introducing a client selection mechanism and a specialized server architecture, FedRepre notably improves the efficiency, scalability, and performance of FL systems. Our talk aims to provide a comprehensive overview of the current research and advancements in tackling both data and system heterogeneity in federated learning. We hope to highlight the path forward for FL, underlining its potential in diverse real-world applications while maintaining data privacy and optimizing resource usage.