CloserToMe: A Unified Framework for Accurate and Transferable Latency Prediction Across Heterogeneous Devices
DOI:
https://doi.org/10.1609/aaai.v40i30.39779Abstract
Hardware accelerators such as GPUs, NPUs, and FPGAs are essential to meeting AI’s computational demands. With the proliferation of heterogeneous devices across cloud and edge, various model optimization techniques adapt to diverse hardware characteristics through operator transformations and structural modifications. Accurate, efficient latency prediction enables rapid selection of optimal strategies across hardware backends. Many existing methods treat hardware as a black-box executor, directly regressing latency without explicitly modeling the intricate interactions between neural network (NN) structures and device-specific execution behaviors. To address these challenges, we introduce a new modeling perspective that captures the interaction between neural architectures and hardware execution. To capture device-specific characteristics, we propose two complementary modeling strategies. The Device Behavior Signature Selector (DBSel) characterizes hardware execution behavior by selectively probing a small set of representative architectures, forming a compact, workload-driven profile. In parallel, we construct capability vectors that capture the hierarchical memory of each device and compute characteristics, providing a structured abstraction of its architectural capacity. To unify both behavioral and structural views, we introduce the Hardware–Operation Dialogue Module (HODM), which models fine-grained interactions between neural operators and hardware properties. Together, these components empower CloserToMe to deliver accurate and transferable latency predictions across unseen and diverse platforms.Downloads
Published
2026-03-14
How to Cite
Tang, C., Sui, G., Lou, W., Wang, Z., Tuo, J., Xie, W., … Zhou, X. (2026). CloserToMe: A Unified Framework for Accurate and Transferable Latency Prediction Across Heterogeneous Devices. Proceedings of the AAAI Conference on Artificial Intelligence, 40(30), 25805–25813. https://doi.org/10.1609/aaai.v40i30.39779
Issue
Section
AAAI Technical Track on Machine Learning VII