A Survey on Model Compression and Acceleration for Pretrained Language Models

Canwen Xu; Julian McAuley

doi:10.1609/aaai.v37i9.26255

A Survey on Model Compression and Acceleration for Pretrained Language Models

Authors

Canwen Xu University of California, San Diego
Julian McAuley University of California, San Diego

DOI:

https://doi.org/10.1609/aaai.v37i9.26255

Keywords:

ML: Learning on the Edge & Model Compression, SNLP: Language Models

Abstract

Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long inference delay prevent Transformer-based pretrained language models (PLMs) from seeing broader adoption including for edge and mobile computing. Efficient NLP research aims to comprehensively consider computation, time and carbon emission for the entire life-cycle of NLP, including data preparation, model training and inference. In this survey, we focus on the inference stage and review the current state of model compression and acceleration for pretrained language models, including benchmarks, metrics and methodology.

Downloads

Published

2023-06-26

How to Cite

Xu, C., & McAuley, J. (2023). A Survey on Model Compression and Acceleration for Pretrained Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 37(9), 10566–10575. https://doi.org/10.1609/aaai.v37i9.26255

Download Citation

Issue

Vol. 37 No. 9: AAAI-23 Technical Tracks 9

Section

AAAI Technical Track on Machine Learning IV

A Survey on Model Compression and Acceleration for Pretrained Language Models

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information