Efficient Distribution Similarity Identification in Clustered Federated Learning via Principal Angles between Client Data Subspaces

Authors

  • Saeed Vahidian University of California San Diego
  • Mahdi Morafah University of California San Diego
  • Weijia Wang University of California San Diego
  • Vyacheslav Kungurtsev Czech Technical University
  • Chen Chen University of Central Florida
  • Mubarak Shah University of Central Florida
  • Bill Lin University of California San Diego

DOI:

https://doi.org/10.1609/aaai.v37i8.26197

Keywords:

ML: Deep Neural Network Algorithms, ML: Distributed Machine Learning & Federated Learning

Abstract

Clustered federated learning (FL) has been shown to produce promising results by grouping clients into clusters. This is especially effective in scenarios where separate groups of clients have significant differences in the distributions of their local data. Existing clustered FL algorithms are essentially trying to group together clients with similar distributions so that clients in the same cluster can leverage each other's data to better perform federated learning. However, prior clustered FL algorithms attempt to learn these distribution similarities indirectly during training, which can be quite time consuming as many rounds of federated learning may be required until the formation of clusters is stabilized. In this paper, we propose a new approach to federated learning that directly aims to efficiently identify distribution similarities among clients by analyzing the principal angles between the client data subspaces. Each client applies a truncated singular value decomposition (SVD) step on its local data in a single-shot manner to derive a small set of principal vectors, which provides a signature that succinctly captures the main characteristics of the underlying distribution. This small set of principal vectors is provided to the server so that the server can directly identify distribution similarities among the clients to form clusters. This is achieved by comparing the similarities of the principal angles between the client data subspaces spanned by those principal vectors. The approach provides a simple, yet effective clustered FL framework that addresses a broad range of data heterogeneity issues beyond simpler forms of Non-IIDness like label skews. Our clustered FL approach also enables convergence guarantees for non-convex objectives.

Downloads

Published

2023-06-26

How to Cite

Vahidian, S., Morafah, M., Wang, W., Kungurtsev, V., Chen, C., Shah, M., & Lin, B. (2023). Efficient Distribution Similarity Identification in Clustered Federated Learning via Principal Angles between Client Data Subspaces. Proceedings of the AAAI Conference on Artificial Intelligence, 37(8), 10043-10052. https://doi.org/10.1609/aaai.v37i8.26197

Issue

Section

AAAI Technical Track on Machine Learning III