Preparing Lessons for Progressive Training on Language Models

Yu Pan; Ye Yuan; Yichun Yin; Jiaxin Shi; Zenglin Xu; Ming Zhang; Lifeng Shang; Xin Jiang; Qun Liu

doi:10.1609/aaai.v38i17.29851

Authors

Yu Pan Harbin Institute of Technology Shenzhen, Shenzhen, Guangdong, China
Ye Yuan School of Computer Science, Peking University, Beijing, China Peking University-Anker Embodied AI Lab
Yichun Yin Huawei Noah’s Ark Lab, Shenzhen, Guangdong, China
Jiaxin Shi Cloud BU, Huawei Technologies
Zenglin Xu Harbin Institute of Technology Shenzhen, Shenzhen, Guangdong, China Pengcheng Laboratory, Shenzhen, China
Ming Zhang School of Computer Science, Peking University, Beijing, China Peking University-Anker Embodied AI Lab
Lifeng Shang Huawei Noah’s Ark Lab, Shenzhen, Guangdong, China
Xin Jiang Huawei Noah’s Ark Lab, Shenzhen, Guangdong, China
Qun Liu Huawei Noah’s Ark Lab, Shenzhen, Guangdong, China

DOI:

https://doi.org/10.1609/aaai.v38i17.29851

Keywords:

NLP: (Large) Language Models, NLP: Learning & Optimization for NLP

Abstract

The rapid progress of Transformers in artificial intelligence has come at the cost of increased resource consumption and greenhouse gas emissions due to growing model sizes. Prior work suggests using pretrained small models to improve training efficiency, but this approach may not be suitable for new model structures. On the other hand, training from scratch can be slow, and progressively stacking layers often fails to achieve significant acceleration. To address these challenges, we propose a novel method called Apollo, which prepares lessons for expanding operations by learning high-layer functionality during training of low layers. Our approach involves low-value-prioritized sampling (LVPS) to train different depths and weight sharing to facilitate efficient expansion. We also introduce an interpolation method for stable model depth extension. Experiments demonstrate that Apollo achieves state-of-the-art acceleration ratios, even rivaling methods using pretrained models, making it a universal and efficient solution for training deep models while reducing time, financial, and environmental costs.

Preparing Lessons for Progressive Training on Language Models

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription