CloserToMe: A Unified Framework for Accurate and Transferable Latency Prediction Across Heterogeneous Devices

Cheng Tang; Guochong Sui; Wenqi Lou; Zihan Wang; Jiayi Tuo; Wenqian Xie; Yinkang Gao; Yixuan Zhu; Lei Gong; Chao Wang; Xuehai Zhou

doi:10.1609/aaai.v40i30.39779

Authors

Cheng Tang University of Science and Technology of China
Guochong Sui University of Science and Technology of China
Wenqi Lou University of Science and Technology of China Suzhou Institute of Advanced Research, University of Science and Technology of China
Zihan Wang University of Science and Technology of China
Jiayi Tuo University of Science and Technology of China
Wenqian Xie Duke University
Yinkang Gao University of Science and Technology of China
Yixuan Zhu University of Science and Technology of China
Lei Gong University of Science and Technology of China
Chao Wang University of Science and Technology of China Suzhou Institute of Advanced Research, University of Science and Technology of China
Xuehai Zhou University of Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v40i30.39779

Abstract

Hardware accelerators such as GPUs, NPUs, and FPGAs are essential to meeting AI’s computational demands. With the proliferation of heterogeneous devices across cloud and edge, various model optimization techniques adapt to diverse hardware characteristics through operator transformations and structural modifications. Accurate, efficient latency prediction enables rapid selection of optimal strategies across hardware backends. Many existing methods treat hardware as a black-box executor, directly regressing latency without explicitly modeling the intricate interactions between neural network (NN) structures and device-specific execution behaviors. To address these challenges, we introduce a new modeling perspective that captures the interaction between neural architectures and hardware execution. To capture device-specific characteristics, we propose two complementary modeling strategies. The Device Behavior Signature Selector (DBSel) characterizes hardware execution behavior by selectively probing a small set of representative architectures, forming a compact, workload-driven profile. In parallel, we construct capability vectors that capture the hierarchical memory of each device and compute characteristics, providing a structured abstraction of its architectural capacity. To unify both behavioral and structural views, we introduce the Hardware–Operation Dialogue Module (HODM), which models fine-grained interactions between neural operators and hardware properties. Together, these components empower CloserToMe to deliver accurate and transferable latency predictions across unseen and diverse platforms.

CloserToMe: A Unified Framework for Accurate and Transferable Latency Prediction Across Heterogeneous Devices

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information