Breaking Model Lock-in: Cost-Efficient Zero-Shot LLM Routing via a Universal Latent Space

Authors

  • Cheng Yan University of Science and Technology of China
  • Wuyang Zhang University of Science and Technology of China
  • Zhiyuan Ning University of Science and Technology of China
  • Fan Xu University of Science and Technology of China
  • Ziyang Tao University of Science and Technology of China
  • Lu Zhang Hefei Comprehensive National Science Center
  • Bing Yin Research Department, iFLYTEK Co., LTD.
  • Yanyong Zhang University of Science and Technology of China Hefei Comprehensive National Science Center

DOI:

https://doi.org/10.1609/aaai.v40i43.40970

Abstract

The rapid proliferation of Large Language Models (LLMs) has led to a fragmented and inefficient ecosystem, a state of ``model lock-in'' where seamlessly integrating novel models remains a significant bottleneck. Current routing frameworks require exhaustive, costly retraining, hindering scalability and adaptability. We introduce ZeroRouter, a new paradigm for LLM routing that breaks this lock-in. Our approach is founded on a universal latent space, a model-agnostic representation of query difficulty that fundamentally decouples the characterization of a query from the profiling of a model. This allows for zero-shot onboarding of new models without full-scale retraining. ZeroRouter features a context-aware predictor that maps queries to this universal space and a dual-mode optimizer that balances accuracy, cost, and latency. Our framework consistently outperforms all baselines, delivering higher accuracy at lower cost and latency.

Downloads

Published

2026-03-14

How to Cite

Yan, C., Zhang, W., Ning, Z., Xu, F., Tao, Z., Zhang, L., … Zhang, Y. (2026). Breaking Model Lock-in: Cost-Efficient Zero-Shot LLM Routing via a Universal Latent Space. Proceedings of the AAAI Conference on Artificial Intelligence, 40(43), 36483–36490. https://doi.org/10.1609/aaai.v40i43.40970

Issue

Section

AAAI Technical Track on Planning, Routing, and Scheduling