LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification

Authors

  • Ting Jiang SKLSDE and BDBC Lab, Beihang University, Beijing, China
  • Deqing Wang SKLSDE and BDBC Lab, Beihang University, Beijing, China
  • Leilei Sun SKLSDE and BDBC Lab, Beihang University, Beijing, China
  • Huayi Yang SKLSDE and BDBC Lab, Beihang University, Beijing, China
  • Zhengyang Zhao SKLSDE and BDBC Lab, Beihang University, Beijing, China
  • Fuzhen Zhuang Key Lab of Intelligent Information Processing of CAS, Institute of Computing Technology, CAS Beijing Advanced Innovation Center for Imaging Theory and Technology, Academy for Multidisciplinary Studies, Capital Normal University, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v35i9.16974

Keywords:

Multi-class/Multi-label Learning & Extreme Classification

Abstract

Extreme multi-label text classification(XMC) is a task for finding the most relevant labels from a large label set. Nowadays deep learning-based methods have shown significant success in XMC. However, the existing methods (e.g., AttentionXML and X-Transformer etc) still suffer from 1) combining several models to train and predict for one dataset, and 2) sampling negative labels statically during the process of training label ranking model, which will harm the performance and accuracy of model. To address the above problems, we propose LightXML, which adopts end-to-end training and dynamical negative labels sampling. In LightXML, we use GAN like networks to recall and rank labels. The label recalling part will generate negative and positive labels, and the label ranking part will distinguish positive labels from these labels. Based on these networks, negative labels are sampled dynamically during label ranking part training. With feeding both label recalling and ranking parts with the same text representation, LightXML can reach high performance. Extensive experiments show that LightXML outperforms state-of-the-art methods in five extreme multi-label datasets with much smaller model size and lower computational complexity. In particular, on the Amazon dataset with 670K labels, LightXML can reduce the model size up to 72% compared to AttentionXML. Our code is available at http://github.com/kongds/LightXML.

Downloads

Published

2021-05-18

How to Cite

Jiang, T., Wang, D., Sun, L., Yang, H., Zhao, Z., & Zhuang, F. (2021). LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 35(9), 7987-7994. https://doi.org/10.1609/aaai.v35i9.16974

Issue

Section

AAAI Technical Track on Machine Learning II