LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification

Ting Jiang; Deqing Wang; Leilei Sun; Huayi Yang; Zhengyang Zhao; Fuzhen Zhuang

doi:10.1609/aaai.v35i9.16974

Authors

Ting Jiang SKLSDE and BDBC Lab, Beihang University, Beijing, China
Deqing Wang SKLSDE and BDBC Lab, Beihang University, Beijing, China
Leilei Sun SKLSDE and BDBC Lab, Beihang University, Beijing, China
Huayi Yang SKLSDE and BDBC Lab, Beihang University, Beijing, China
Zhengyang Zhao SKLSDE and BDBC Lab, Beihang University, Beijing, China
Fuzhen Zhuang Key Lab of Intelligent Information Processing of CAS, Institute of Computing Technology, CAS Beijing Advanced Innovation Center for Imaging Theory and Technology, Academy for Multidisciplinary Studies, Capital Normal University, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v35i9.16974

Keywords:

Multi-class/Multi-label Learning & Extreme Classification

Abstract

Extreme multi-label text classification(XMC) is a task for finding the most relevant labels from a large label set. Nowadays deep learning-based methods have shown significant success in XMC. However, the existing methods (e.g., AttentionXML and X-Transformer etc) still suffer from 1) combining several models to train and predict for one dataset, and 2) sampling negative labels statically during the process of training label ranking model, which will harm the performance and accuracy of model. To address the above problems, we propose LightXML, which adopts end-to-end training and dynamical negative labels sampling. In LightXML, we use GAN like networks to recall and rank labels. The label recalling part will generate negative and positive labels, and the label ranking part will distinguish positive labels from these labels. Based on these networks, negative labels are sampled dynamically during label ranking part training. With feeding both label recalling and ranking parts with the same text representation, LightXML can reach high performance. Extensive experiments show that LightXML outperforms state-of-the-art methods in five extreme multi-label datasets with much smaller model size and lower computational complexity. In particular, on the Amazon dataset with 670K labels, LightXML can reduce the model size up to 72% compared to AttentionXML. Our code is available at http://github.com/kongds/LightXML.

LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription