AdaReason: Progressive Training of Multi-LoRA Adapters for Budget-Adaptive Language Reasoning Models
DOI:
https://doi.org/10.1609/aaai.v40i31.39828Abstract
Large reasoning models (LRMs) have demonstrated remarkable capabilities in solving complex problems through extended chain-of-thought reasoning. However, existing approaches face a fundamental trade-off between computational efficiency and reasoning accuracy. Current methods either lack support for user-specified computational budgets or require maintaining multiple independent models, leading to significant resource overhead. In this paper, we present AdaReason, a unified framework that trains a single base model to support arbitrary user-defined computational budgets through dynamic adapter composition. Our approach introduces three key innovations: (1) a length-adaptive step reward function that stabilizes training across diverse budget constraints, (2) a progressive training strategy that gradually tightens computational bounds while maintaining model performance, and (3) a runtime adapter merging mechanism that dynamically interpolates between different computational preferences. Unlike existing methods that suffer from training instability in large context windows, AdaReason achieves stable convergence through careful reward shaping and progressive constraint tightening. Additionally, we provide a rigorous theoretical analysis, establishing a performance bound for our merged model. Experiments on different reasoning benchmarks demonstrate that AdaReason establishes a new state-of-the-art in the performance-efficiency trade-off and enables flexible runtime budget adaptation.Downloads
Published
2026-03-14
How to Cite
Wang, J., Chen, T., Cheng, P., Hou, X., & Liu, J. (2026). AdaReason: Progressive Training of Multi-LoRA Adapters for Budget-Adaptive Language Reasoning Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(31), 26242–26250. https://doi.org/10.1609/aaai.v40i31.39828
Issue
Section
AAAI Technical Track on Machine Learning VIII