Breaking Model Lock-in: Cost-Efficient Zero-Shot LLM Routing via a Universal Latent Space

Cheng Yan; Wuyang Zhang; Zhiyuan Ning; Fan Xu; Ziyang Tao; Lu Zhang; Bing Yin; Yanyong Zhang

doi:10.1609/aaai.v40i43.40970

Authors

Cheng Yan University of Science and Technology of China
Wuyang Zhang University of Science and Technology of China
Zhiyuan Ning University of Science and Technology of China
Fan Xu University of Science and Technology of China
Ziyang Tao University of Science and Technology of China
Lu Zhang Hefei Comprehensive National Science Center
Bing Yin Research Department, iFLYTEK Co., LTD.
Yanyong Zhang University of Science and Technology of China Hefei Comprehensive National Science Center

DOI:

https://doi.org/10.1609/aaai.v40i43.40970

Abstract

The rapid proliferation of Large Language Models (LLMs) has led to a fragmented and inefficient ecosystem, a state of ``model lock-in'' where seamlessly integrating novel models remains a significant bottleneck. Current routing frameworks require exhaustive, costly retraining, hindering scalability and adaptability. We introduce ZeroRouter, a new paradigm for LLM routing that breaks this lock-in. Our approach is founded on a universal latent space, a model-agnostic representation of query difficulty that fundamentally decouples the characterization of a query from the profiling of a model. This allows for zero-shot onboarding of new models without full-scale retraining. ZeroRouter features a context-aware predictor that maps queries to this universal space and a dual-mode optimizer that balances accuracy, cost, and latency. Our framework consistently outperforms all baselines, delivering higher accuracy at lower cost and latency.

Breaking Model Lock-in: Cost-Efficient Zero-Shot LLM Routing via a Universal Latent Space

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information