Topic Correlation Analysis for Cross-Domain Text Classification

Lianghao Li; Xiaoming Jin; Mingsheng Long

doi:10.1609/aaai.v26i1.8308

Authors

Lianghao Li Tsinghua University
Xiaoming Jin Tsinghua University
Mingsheng Long Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v26i1.8308

Keywords:

Domain Adaptation, Topic Modeling, Text Classification, Transfer Learning

Abstract

Cross-domain text classification aims to automatically train a precise text classifier for a target domain by using labeled text data from a related source domain. To this end, the distribution gap between different domains has to be reduced. In previous works, a certain number of shared latent features (e.g., latent topics, principal components, etc.) are extracted to represent documents from different domains, and thus reduce the distribution gap. However, only relying the shared latent features as the domain bridge may limit the amount of knowledge transferred. This limitation is more serious when the distribution gap is so large that only a small number of latent features can be shared between domains. In this paper, we propose a novel approach named Topic Correlation Analysis (TCA), which extracts both the shared and the domain-specific latent features to facilitate effective knowledge transfer. In TCA, all word features are first grouped into the shared and the domain-specific topics using a joint mixture model. Then the correlations between the two kinds of topics are inferred and used to induce a mapping between the domain-specific topics from different domains. Finally, both the shared and the mapped domain-specific topics are utilized to span a new shared feature space where the supervised knowledge can be effectively transferred. The experimental results on two real-world data sets justify the superiority of the proposed method over the stat-of-the-art baselines.

Topic Correlation Analysis for Cross-Domain Text Classification

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription