CloserToMe: A Unified Framework for Accurate and Transferable Latency Prediction Across Heterogeneous Devices

Authors

  • Cheng Tang University of Science and Technology of China
  • Guochong Sui University of Science and Technology of China
  • Wenqi Lou University of Science and Technology of China Suzhou Institute of Advanced Research, University of Science and Technology of China
  • Zihan Wang University of Science and Technology of China
  • Jiayi Tuo University of Science and Technology of China
  • Wenqian Xie Duke University
  • Yinkang Gao University of Science and Technology of China
  • Yixuan Zhu University of Science and Technology of China
  • Lei Gong University of Science and Technology of China
  • Chao Wang University of Science and Technology of China Suzhou Institute of Advanced Research, University of Science and Technology of China
  • Xuehai Zhou University of Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v40i30.39779

Abstract

Hardware accelerators such as GPUs, NPUs, and FPGAs are essential to meeting AI’s computational demands. With the proliferation of heterogeneous devices across cloud and edge, various model optimization techniques adapt to diverse hardware characteristics through operator transformations and structural modifications. Accurate, efficient latency prediction enables rapid selection of optimal strategies across hardware backends. Many existing methods treat hardware as a black-box executor, directly regressing latency without explicitly modeling the intricate interactions between neural network (NN) structures and device-specific execution behaviors. To address these challenges, we introduce a new modeling perspective that captures the interaction between neural architectures and hardware execution. To capture device-specific characteristics, we propose two complementary modeling strategies. The Device Behavior Signature Selector (DBSel) characterizes hardware execution behavior by selectively probing a small set of representative architectures, forming a compact, workload-driven profile. In parallel, we construct capability vectors that capture the hierarchical memory of each device and compute characteristics, providing a structured abstraction of its architectural capacity. To unify both behavioral and structural views, we introduce the Hardware–Operation Dialogue Module (HODM), which models fine-grained interactions between neural operators and hardware properties. Together, these components empower CloserToMe to deliver accurate and transferable latency predictions across unseen and diverse platforms.

Downloads

Published

2026-03-14

How to Cite

Tang, C., Sui, G., Lou, W., Wang, Z., Tuo, J., Xie, W., … Zhou, X. (2026). CloserToMe: A Unified Framework for Accurate and Transferable Latency Prediction Across Heterogeneous Devices. Proceedings of the AAAI Conference on Artificial Intelligence, 40(30), 25805–25813. https://doi.org/10.1609/aaai.v40i30.39779

Issue

Section

AAAI Technical Track on Machine Learning VII