Towards Understanding In-Context Learning of Transformers Under Non-I.I.D. Scenarios
DOI:
https://doi.org/10.1609/aaai.v40i30.39724Abstract
Understanding the generalization behavior of in-context learning (ICL) in Transformers remains a fundamental challenge, as most existing theoretical analyses are based on the assumption that data are independently and identically distributed (i.i.d.), an assumption that often does not hold in practice. Motivated by the theoretical insight that ICL operates similarly to gradient-based optimization, we leverage the concept of gradient stability to establish generalization error bounds for ICL under a general non-i.i.d. setting. Our analysis shows that two factors play a central role in ICL generalization: the number of demonstrations in the prompt and their distributional alignment with the query. In particular, increasing the number of demonstrations and improving their alignment with the query distribution lead to better generalization, even without any parameter tuning. Under mild conditions, we further prove that the generalization error can achieve the optimal convergence rate of O(N^(-1/2)), where N is the number of demonstrations. Our empirical evaluations validate the effectiveness of our theoretical findings.Published
2026-03-14
How to Cite
Shen, Q., Wang, Y., & Xiang, J. (2026). Towards Understanding In-Context Learning of Transformers Under Non-I.I.D. Scenarios. Proceedings of the AAAI Conference on Artificial Intelligence, 40(30), 25312–25320. https://doi.org/10.1609/aaai.v40i30.39724
Issue
Section
AAAI Technical Track on Machine Learning VII