Efficient Device Scheduling with Multi-Job Federated Learning

Authors

  • Chendi Zhou Soochow University
  • Ji Liu Baidu Research
  • Juncheng Jia Soochow University
  • Jingbo Zhou Baidu Research
  • Yang Zhou Auburn University
  • Huaiyu Dai NC State University
  • Dejing Dou Baidu Research

DOI:

https://doi.org/10.1609/aaai.v36i9.21235

Keywords:

Planning, Routing, And Scheduling (PRS), Multiagent Systems (MAS)

Abstract

Recent years have witnessed a large amount of decentralized data in multiple (edge) devices of end-users, while the aggregation of the decentralized data remains difficult for machine learning jobs due to laws or regulations. Federated Learning (FL) emerges as an effective approach to handling decentralized data without sharing the sensitive raw data, while collaboratively training global machine learning models. The servers in FL need to select (and schedule) devices during the training process. However, the scheduling of devices for multiple jobs with FL remains a critical and open problem. In this paper, we propose a novel multi-job FL framework to enable the parallel training process of multiple jobs. The framework consists of a system model and two scheduling methods. In the system model, we propose a parallel training process of multiple jobs, and construct a cost model based on the training time and the data fairness of various devices during the training process of diverse jobs. We propose a reinforcement learning-based method and a Bayesian optimization-based method to schedule devices for multiple jobs while minimizing the cost. We conduct extensive experimentation with multiple jobs and datasets. The experimental results show that our proposed approaches significantly outperform baseline approaches in terms of training time (up to 8.67 times faster) and accuracy (up to 44.6% higher).

Downloads

Published

2022-06-28

How to Cite

Zhou, C., Liu, J., Jia, J., Zhou, J., Zhou, Y., Dai, H., & Dou, D. (2022). Efficient Device Scheduling with Multi-Job Federated Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 36(9), 9971-9979. https://doi.org/10.1609/aaai.v36i9.21235

Issue

Section

AAAI Technical Track on Planning, Routing, and Scheduling