Adaptive Computation Modules: Granular Conditional Computation for Efficient Inference

Authors

  • Bartosz Wójcik IDEAS NCBR Jagiellonian University
  • Alessio Devoto Sapienza University of Rome
  • Karol Pustelnik University of Warsaw
  • Pasquale Minervini University of Edinburgh Miniml.AI
  • Simone Scardapane Sapienza University of Rome

DOI:

https://doi.org/10.1609/aaai.v39i20.35453

Abstract

While transformer models have been highly successful, they are computationally inefficient. We observe that for each layer, the full width of the layer may be needed only for a small subset of tokens inside a batch and that the "effective" width needed to process a token can vary from layer to layer. Motivated by this observation, we introduce the Adaptive Computation Module (ACM), a generic module that dynamically adapts its computational load to match the estimated difficulty of the input on a per-token basis. An ACM consists of a sequence of learners that progressively refine the output of their preceding counterparts. An additional gating mechanism determines the optimal number of learners to execute for each token. We also propose a distillation technique to replace any pre-trained model with an "ACMized" variant. Our evaluation of transformer models in computer vision and speech recognition demonstrates that substituting layers with ACMs significantly reduces inference costs without degrading the downstream accuracy for a wide interval of user-defined budgets.

Published

2025-04-11

How to Cite

Wójcik, B., Devoto, A., Pustelnik, K., Minervini, P., & Scardapane, S. (2025). Adaptive Computation Modules: Granular Conditional Computation for Efficient Inference. Proceedings of the AAAI Conference on Artificial Intelligence, 39(20), 21510–21518. https://doi.org/10.1609/aaai.v39i20.35453

Issue

Section

AAAI Technical Track on Machine Learning VI