Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute
DOI:
https://doi.org/10.1609/aaai.v40i24.39094Abstract
This paper presents a simple, effective, and cost-efficient strategy, named ModelSwitch, to improve LLM performance by scaling test-time compute. ModelSwitch builds upon the repeated-sampling-then-voting framework, with a novel twist: incorporating multiple models, even weaker ones, to leverage their complementary strengths that potentially arise from diverse training data and paradigms. By using sample consistency as a signal, our strategy dynamically switches between models. Theoretical analysis highlights the efficiency and performance advantages of our strategy. Extensive experiments on seven datasets demonstrate that our strategy not only outperforms self-consistency and state-of-the-art multi-agent debate approaches, but also significantly reduces inference costs. Additionally, our strategy requires only a few comparable LLMs to achieve optimal performance and can be extended with verification methods, demonstrating the potential of leveraging multiple LLMs in the generation-verification paradigm.Downloads
Published
2026-03-14
How to Cite
Chen, J., Xun, Z., Zhou, B., Qi, H., Zhang, H., Zhang, Q., Chen, Y., Hu, W., Qu, Y., & Hu, S. (2026). Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute. Proceedings of the AAAI Conference on Artificial Intelligence, 40(24), 20083-20091. https://doi.org/10.1609/aaai.v40i24.39094
Issue
Section
AAAI Technical Track on Machine Learning I