Acceleration of Large Transformer Model Training by Sensitivity-Based Layer Dropping

Yujie Zeng; Wenlong He; Ihor Vasyltsov; Jiali Pang; Lin Chen

doi:10.1609/aaai.v37i9.26321

Authors

Yujie Zeng Samsung, SRCX
Wenlong He Samsung, SRCX
Ihor Vasyltsov Samsung, SAIT
Jiali Pang Samsung, SRCX
Lin Chen Samsung, SRCX

DOI:

https://doi.org/10.1609/aaai.v37i9.26321

Keywords:

ML: Distributed Machine Learning & Federated Learning, ML: Optimization, SNLP: Language Models, SNLP: Learning & Optimization for SNLP

Abstract

Transformer models are widely used in AI applications such as Natural Language Processing (NLP), Computer Vision (CV), etc. However, enormous computation workload be-comes an obstacle to train large transformer models efficiently. Recently, some methods focus on reducing the computation workload during the training by skipping some layers. How-ever, these methods use simple probability distribution and coarse-grained probability calculation, which significantly affect the model accuracy. To address the issue, in this paper we propose a novel method to accelerate training—Sensitivity-Based Layer Dropping (SBLD). SBLD uses lay-er-wise sensitivity data to switch on/off transformer layers in proper order to keep high accuracy. Besides, we adjust the probability of skipping transformer layers with a scheduler to accelerate training speed and get faster convergence. Our results show that SBLD solves the accuracy drop issue com-pared with prior layer dropping methods. Our SBLD method can decrease end-to-end training time by 19.67% during training of GPT-3 Medium model, the same time increasing the accuracy by 1.65% w.r.t. baseline. Furthermore, for SwinV2-L model the obtained Top-1 and Top-5 accuracies are also higher vs. the baseline. Thus, the proposed method is efficient and practical to improve the large transformer model training.

Acceleration of Large Transformer Model Training by Sensitivity-Based Layer Dropping

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information