Lost Domain Generalization Is a Natural Consequence of Lack of Training Domains

Yimu Wang; Yihan Wu; Hongyang Zhang

doi:10.1609/aaai.v38i14.29497

Authors

Yimu Wang University of Waterloo
Yihan Wu University of Maryland, College Park
Hongyang Zhang University of Waterloo

DOI:

https://doi.org/10.1609/aaai.v38i14.29497

Keywords:

ML: Adversarial Learning & Robustness, General, ML: Transfer, Domain Adaptation, Multi-Task Learning, ML: Learning Theory

Abstract

We show a hardness result for the number of training domains required to achieve a small population error in the test domain. Although many domain generalization algorithms have been developed under various domain-invariance assumptions, there is significant evidence to indicate that out-of-distribution (o.o.d.) test accuracy of state-of-the-art o.o.d. algorithms is on par with empirical risk minimization and random guess on the domain generalization benchmarks such as DomainBed. In this work, we analyze its cause and attribute the lost domain generalization to the lack of training domains. We show that, in a minimax lower bound fashion, any learning algorithm that outputs a classifier with an ε excess error to the Bayes optimal classifier requires at least poly(1/ε) number of training domains, even though the number of training data sampled from each training domain is large. Experiments on the DomainBed benchmark demonstrate that o.o.d. test accuracy is monotonically increasing as the number of training domains increases. Our result sheds light on the intrinsic hardness of domain generalization and suggests benchmarking o.o.d. algorithms by the datasets with a sufficient number of training domains.

Lost Domain Generalization Is a Natural Consequence of Lack of Training Domains

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription