Lost Domain Generalization Is a Natural Consequence of Lack of Training Domains

Authors

  • Yimu Wang University of Waterloo
  • Yihan Wu University of Maryland, College Park
  • Hongyang Zhang University of Waterloo

DOI:

https://doi.org/10.1609/aaai.v38i14.29497

Keywords:

ML: Adversarial Learning & Robustness, General, ML: Transfer, Domain Adaptation, Multi-Task Learning, ML: Learning Theory

Abstract

We show a hardness result for the number of training domains required to achieve a small population error in the test domain. Although many domain generalization algorithms have been developed under various domain-invariance assumptions, there is significant evidence to indicate that out-of-distribution (o.o.d.) test accuracy of state-of-the-art o.o.d. algorithms is on par with empirical risk minimization and random guess on the domain generalization benchmarks such as DomainBed. In this work, we analyze its cause and attribute the lost domain generalization to the lack of training domains. We show that, in a minimax lower bound fashion, any learning algorithm that outputs a classifier with an ε excess error to the Bayes optimal classifier requires at least poly(1/ε) number of training domains, even though the number of training data sampled from each training domain is large. Experiments on the DomainBed benchmark demonstrate that o.o.d. test accuracy is monotonically increasing as the number of training domains increases. Our result sheds light on the intrinsic hardness of domain generalization and suggests benchmarking o.o.d. algorithms by the datasets with a sufficient number of training domains.

Downloads

Published

2024-03-24

How to Cite

Wang, Y., Wu, Y., & Zhang, H. (2024). Lost Domain Generalization Is a Natural Consequence of Lack of Training Domains. Proceedings of the AAAI Conference on Artificial Intelligence, 38(14), 15689-15697. https://doi.org/10.1609/aaai.v38i14.29497

Issue

Section

AAAI Technical Track on Machine Learning V