Outlier Matters: Efficient Long-to-Short Reasoning via Outlier-Guided Model Merging
DOI:
https://doi.org/10.1609/aaai.v40i41.40828Abstract
Large Reasoning Language Models (LRMs) have recently shown remarkable performance in complex reasoning tasks, but their extensive reasoning chains incur substantial computational overhead. To address this challenge, we propose Outlier-aware Reasoning Conciseness Adaptive Merge (ORCA), a novel plug-and-play model merging framework that leverages outlier activation patterns to fuse base models with reasoning models. Our ORCA introduces three key innovations: (1) adaptive alignment that reduces conflicts between disparate activation patterns during merging, (2) outlier-guided allocation that assigns merging coefficients proportional to each layer's reasoning importance as indicated by outlier concentrations, and (3) dynamic probe-based adjustment that adapts merging coefficients during inference based on input-specific activation characteristics. These strategies allow seamless integration into existing merging pipelines while creating unified models that maintain reasoning accuracy with significantly reduced response verbosity. Comprehensive evaluation across six benchmarks using Qwen and LLaMA models shows ORCA reduces average response length by 55% while improving accuracy by 2.4∼5.7% over existing methods.Downloads
Published
2026-03-14
How to Cite
Zhu, Q., Li, D., Li, L., Qin, X., Li, W., Gu, H., … Guo, Y. (2026). Outlier Matters: Efficient Long-to-Short Reasoning via Outlier-Guided Model Merging. Proceedings of the AAAI Conference on Artificial Intelligence, 40(41), 35213–35221. https://doi.org/10.1609/aaai.v40i41.40828
Issue
Section
AAAI Technical Track on Natural Language Processing VI