Predicting Emergent Tool Use in LLMs Before It Emerges: A Proxy Perspective

Bo-Wen Zhang; Yan Yan; Guang Liu; Xu-Cheng Yin

doi:10.1609/aaai.v40i41.40763

Authors

Bo-Wen Zhang University of Science and Technology Beijing
Yan Yan China University of Mining Technology Beijing
Guang Liu Beijing Academy of Artificial Intelligence
Xu-Cheng Yin University of Science and Technology Beijing

DOI:

https://doi.org/10.1609/aaai.v40i41.40763

Abstract

Tool-use capabilities fundamentally transform large language models (LLMs) from passive language generators into active agents with real-world utility, drawing intense research focus. Yet, their emergent nature renders traditional scaling laws ineffective for early-stage prediction, obstructing principled model design and efficient training. In this work, we propose a proxy-task perspective that predicts tool-use capabilities by measuring early model performance on selected non-emergent proxy tasks. Our method quantifies two properties of each proxy task: alignment, which reflects how well it captures tool-use trajectories, and stability, which indicates how consistently it behaves across training conditions. These properties are used to weight predictive signals. Theoretically, we formalize how these weighted signals approximate emergent tool use through bounded extrapolation under relaxed assumptions. Empirically, we validate our approach across training checkpoints, model scales, and data setups. Results show that a carefully weighted ensemble of proxy tasks can accurately rank downstream tool-use ability long before it arises. Our findings provide new theoretical foundations and practical tools for efficient training and capability planning, and advance the understanding of how complex abilities arise in LLMs.

Predicting Emergent Tool Use in LLMs Before It Emerges: A Proxy Perspective

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information