Efficient Device Scheduling with Multi-Job Federated Learning
Keywords:Planning, Routing, And Scheduling (PRS), Multiagent Systems (MAS)
AbstractRecent years have witnessed a large amount of decentralized data in multiple (edge) devices of end-users, while the aggregation of the decentralized data remains difficult for machine learning jobs due to laws or regulations. Federated Learning (FL) emerges as an effective approach to handling decentralized data without sharing the sensitive raw data, while collaboratively training global machine learning models. The servers in FL need to select (and schedule) devices during the training process. However, the scheduling of devices for multiple jobs with FL remains a critical and open problem. In this paper, we propose a novel multi-job FL framework to enable the parallel training process of multiple jobs. The framework consists of a system model and two scheduling methods. In the system model, we propose a parallel training process of multiple jobs, and construct a cost model based on the training time and the data fairness of various devices during the training process of diverse jobs. We propose a reinforcement learning-based method and a Bayesian optimization-based method to schedule devices for multiple jobs while minimizing the cost. We conduct extensive experimentation with multiple jobs and datasets. The experimental results show that our proposed approaches significantly outperform baseline approaches in terms of training time (up to 8.67 times faster) and accuracy (up to 44.6% higher).
How to Cite
Zhou, C., Liu, J., Jia, J., Zhou, J., Zhou, Y., Dai, H., & Dou, D. (2022). Efficient Device Scheduling with Multi-Job Federated Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 36(9), 9971-9979. https://doi.org/10.1609/aaai.v36i9.21235
AAAI Technical Track on Planning, Routing, and Scheduling