A Probabilistic Covariate Shift Assumption for Domain Adaptation
Keywords:Machine Learning, Domain Adaptation, Spectral Grouping
The aim of domain adaptation algorithms is to establish a learner, trained on labeled data from a source domain, that can classify samples from a target domain, in which few or no labeled data are available for training. Covariate shift, a primary assumption in several works on domain adaptation, assumes that the labeling functions of source and target domains are identical. We present a domain adaptation algorithm that assumes a relaxed version of covariate shift where the assumption that the labeling functions of the source and target domains are identical holds with a certain probability. Assuming a source deterministic large margin binary classifier, the farther a target instance is from the source decision boundary, the higher the probability that covariate shift holds. In this context, given a target unlabeled sample and no target labeled data, we develop a domain adaptation algorithm that bases its labeling decisions both on the source learner and on the similarities between the target unlabeled instances. The source labeling function decisions associated with probabilistic covariate shift, along with the target similarities are concurrently expressed on a similarity graph. We evaluate our proposed algorithm on a benchmark sentiment analysis (and domain adaptation) dataset, where state-of-the-art adaptation results are achieved. We also derive a lower bound on the performance of the algorithm.