MeRino: Entropy-Driven Design for Generative Language Models on IoT Devices

Authors

  • Youpeng Zhao Department of Computer Science, University of Central Florida
  • Ming Lin Independent Researcher
  • Huadong Tang School of Electrical and Data Engineering, University of Technology Sydney
  • Qiang Wu School of Electrical and Data Engineering, University of Technology Sydney
  • Jun Wang Department of Computer Science, University of Central Florida

DOI:

https://doi.org/10.1609/aaai.v39i21.34445

Abstract

Generative Large Language Models (LLMs) stand as a revolutionary advancement in the modern era of artificial intelligence (AI). However, scaling down LLMs for resource-constrained hardware, such as Internet-of-Things (IoT) devices requires non-trivial efforts and domain knowledge. In this paper, we propose a novel information-entropy framework for designing mobile-friendly generative language models. The whole design procedure involves solving a mathematical programming (MP) problem, which can be done on the CPU within minutes, making it nearly zero-cost. We evaluate our designed models, termed MeRino, across fourteen NLP downstream tasks, showing their competitive performance against the state-of-the-art autoregressive transformer models under the mobile setting. Notably, MeRino achieves similar or better performance on both language modeling and zero-shot learning tasks, compared to the 350M parameter OPT while being 4.9x faster on NVIDIA Jetson Nano with 5.5x reduction in model size.

Downloads

Published

2025-04-11

How to Cite

Zhao, Y., Lin, M., Tang, H., Wu, Q., & Wang, J. (2025). MeRino: Entropy-Driven Design for Generative Language Models on IoT Devices. Proceedings of the AAAI Conference on Artificial Intelligence, 39(21), 22840–22848. https://doi.org/10.1609/aaai.v39i21.34445

Issue

Section

AAAI Technical Track on Machine Learning VII