Towards Understanding In-Context Learning of Transformers Under Non-I.I.D. Scenarios

Authors

  • Qilu Shen Key Laboratory of Smart Farming for Agricultural Animals, Wuhan, Hubei, China College of Informatics, Huazhong Agricultural University, Wuhan, Hubei, China Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, Hubei, China Engineering Research Center of Intelligent Technology for Agriculture, Ministry of Education, Wuhan, Hubei, China
  • Yingjie Wang College of Control Science and Engineering, China University of Petroleum (East China), Qingdao, China
  • Jinhai Xiang Key Laboratory of Smart Farming for Agricultural Animals, Wuhan, Hubei, China College of Informatics, Huazhong Agricultural University, Wuhan, Hubei, China Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, Hubei, China Engineering Research Center of Intelligent Technology for Agriculture, Ministry of Education, Wuhan, Hubei, China

DOI:

https://doi.org/10.1609/aaai.v40i30.39724

Abstract

Understanding the generalization behavior of in-context learning (ICL) in Transformers remains a fundamental challenge, as most existing theoretical analyses are based on the assumption that data are independently and identically distributed (i.i.d.), an assumption that often does not hold in practice. Motivated by the theoretical insight that ICL operates similarly to gradient-based optimization, we leverage the concept of gradient stability to establish generalization error bounds for ICL under a general non-i.i.d. setting. Our analysis shows that two factors play a central role in ICL generalization: the number of demonstrations in the prompt and their distributional alignment with the query. In particular, increasing the number of demonstrations and improving their alignment with the query distribution lead to better generalization, even without any parameter tuning. Under mild conditions, we further prove that the generalization error can achieve the optimal convergence rate of O(N^(-1/2)), where N is the number of demonstrations. Our empirical evaluations validate the effectiveness of our theoretical findings.

Downloads

Published

2026-03-14

How to Cite

Shen, Q., Wang, Y., & Xiang, J. (2026). Towards Understanding In-Context Learning of Transformers Under Non-I.I.D. Scenarios. Proceedings of the AAAI Conference on Artificial Intelligence, 40(30), 25312–25320. https://doi.org/10.1609/aaai.v40i30.39724

Issue

Section

AAAI Technical Track on Machine Learning VII