Empowering Adaptive Early-Exit Inference with Latency Awareness

Authors

  • Xinrui Tan Institute of Information Engineering, Chinese Academy of Sciences
  • Hongjia Li Institute of Information Engineering, Chinese Academy of Sciences
  • Liming Wang Institute of Information Engineering, Chinese Academy of Sciences
  • Xueqing Huang New York Institute of Technology
  • Zhen Xu Institute of Information Engineering, Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v35i11.17181

Keywords:

Learning on the Edge & Model Compression, (Deep) Neural Network Algorithms

Abstract

With the capability of trading accuracy for latency on-the-fly, the technique of adaptive early-exit inference has emerged as a promising line of research to accelerate the deep learning inference. However, studies in this line of research commonly use a group of thresholds to control the accuracy-latency trade-off, where a thorough and general methodology on how to determine these thresholds has not been conducted yet, especially with regard to the common requirements of average inference latency. To address this issue and enable latency-aware adaptive early-exit inference, in the present paper, we approximately formulate the threshold determination problem of finding the accuracy-maximum threshold setting that meets a given average latency requirement, and then propose a threshold determination method to tackle our formulated non-convex problem. Theoretically, we prove that, for certain parameter settings, our method finds an approximate stationary point of the formulated problem. Empirically, on top of various models across multiple datasets (CIFAR-10, CIFAR-100, ImageNet and two time-series datasets), we show that our method can well handle the average latency requirements, and consistently finds good threshold settings in negligible time.

Downloads

Published

2021-05-18

How to Cite

Tan, X., Li, H., Wang, L., Huang, X., & Xu, Z. (2021). Empowering Adaptive Early-Exit Inference with Latency Awareness. Proceedings of the AAAI Conference on Artificial Intelligence, 35(11), 9825-9833. https://doi.org/10.1609/aaai.v35i11.17181

Issue

Section

AAAI Technical Track on Machine Learning IV