Acceleration of Large Transformer Model Training by Sensitivity-Based Layer Dropping

Authors

  • Yujie Zeng Samsung, SRCX
  • Wenlong He Samsung, SRCX
  • Ihor Vasyltsov Samsung, SAIT
  • Jiali Pang Samsung, SRCX
  • Lin Chen Samsung, SRCX

DOI:

https://doi.org/10.1609/aaai.v37i9.26321

Keywords:

ML: Distributed Machine Learning & Federated Learning, ML: Optimization, SNLP: Language Models, SNLP: Learning & Optimization for SNLP

Abstract

Transformer models are widely used in AI applications such as Natural Language Processing (NLP), Computer Vision (CV), etc. However, enormous computation workload be-comes an obstacle to train large transformer models efficiently. Recently, some methods focus on reducing the computation workload during the training by skipping some layers. How-ever, these methods use simple probability distribution and coarse-grained probability calculation, which significantly affect the model accuracy. To address the issue, in this paper we propose a novel method to accelerate training—Sensitivity-Based Layer Dropping (SBLD). SBLD uses lay-er-wise sensitivity data to switch on/off transformer layers in proper order to keep high accuracy. Besides, we adjust the probability of skipping transformer layers with a scheduler to accelerate training speed and get faster convergence. Our results show that SBLD solves the accuracy drop issue com-pared with prior layer dropping methods. Our SBLD method can decrease end-to-end training time by 19.67% during training of GPT-3 Medium model, the same time increasing the accuracy by 1.65% w.r.t. baseline. Furthermore, for SwinV2-L model the obtained Top-1 and Top-5 accuracies are also higher vs. the baseline. Thus, the proposed method is efficient and practical to improve the large transformer model training.

Downloads

Published

2023-06-26

How to Cite

Zeng, Y., He, W., Vasyltsov, I., Pang, J., & Chen, L. (2023). Acceleration of Large Transformer Model Training by Sensitivity-Based Layer Dropping. Proceedings of the AAAI Conference on Artificial Intelligence, 37(9), 11156–11163. https://doi.org/10.1609/aaai.v37i9.26321

Issue

Section

AAAI Technical Track on Machine Learning IV