ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference

Authors

  • Ziqian Zeng South China University of Technology, China
  • Yihuai Hong South China University of Technology, China
  • Hongliang Dai Nanjing University of Aeronautics and Astronautics, China
  • Huiping Zhuang South China University of Technology, China
  • Cen Chen South China University of Technology, China Pazhou Laboratory, China

DOI:

https://doi.org/10.1609/aaai.v38i17.29922

Keywords:

NLP: (Large) Language Models, ML: Learning on the Edge & Model Compression

Abstract

Early Exiting is one of the most popular methods to achieve efficient inference. Current early exiting methods adopt the (weighted) sum of the cross entropy loss of all internal classifiers as the objective function during training, imposing all these classifiers to predict all instances correctly. However, during inference, as long as one internal classifier predicts an instance correctly, it can accelerate without losing accuracy. Thus, there is a notable gap between training and inference. We propose ConsistentEE, an early exiting method that is consistent in training and inference. ConsistentEE formulates the early exiting process as a reinforcement learning problem. A policy network is added to decide whether an instance should exit or continue. The training objective of ConsistentEE only requires each instance to be predicted correctly by one internal classifier. Additionally, we introduce the concept "Memorized Layer" to measure the hardness of an instance. We incorporate the memorized layer into reward function design, which allows "easy'' instances to focus more on acceleration while ``hard'' instances to focus more on accuracy. Experimental results show that our method outperforms other baselines on various natural language understanding and generation tasks using PLMs and LLMs as backbones respectively.

Downloads

Published

2024-03-24

How to Cite

Zeng, Z., Hong, Y., Dai, H., Zhuang, H., & Chen, C. (2024). ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference. Proceedings of the AAAI Conference on Artificial Intelligence, 38(17), 19506-19514. https://doi.org/10.1609/aaai.v38i17.29922

Issue

Section

AAAI Technical Track on Natural Language Processing II