Hierarchical Text Classification as Sub-hierarchy Sequence Generation

Authors

  • SangHun Im School of Computer Science and Engineering, Korea University of Technology and Education (KOREATECH)
  • GiBaeg Kim School of Computer Science and Engineering, Korea University of Technology and Education (KOREATECH)
  • Heung-Seon Oh School of Computer Science and Engineering, Korea University of Technology and Education (KOREATECH)
  • Seongung Jo School of Computer Science and Engineering, Korea University of Technology and Education (KOREATECH)
  • Dong Hwan Kim School of Computer Science and Engineering, Korea University of Technology and Education (KOREATECH)

DOI:

https://doi.org/10.1609/aaai.v37i11.26520

Keywords:

SNLP: Text Classification, ML: Multi-Class/Multi-Label Learning & Extreme Classification, SNLP: Information Extraction, SNLP: Text Mining

Abstract

Hierarchical text classification (HTC) is essential for various real applications. However, HTC models are challenging to develop because they often require processing a large volume of documents and labels with hierarchical taxonomy. Recent HTC models based on deep learning have attempted to incorporate hierarchy information into a model structure. Consequently, these models are challenging to implement when the model parameters increase for a large-scale hierarchy because the model structure depends on the hierarchy size. To solve this problem, we formulate HTC as a sub-hierarchy sequence generation to incorporate hierarchy information into a target label sequence instead of the model structure. Subsequently, we propose the Hierarchy DECoder (HiDEC), which decodes a text sequence into a sub-hierarchy sequence using recursive hierarchy decoding, classifying all parents at the same level into children at once. In addition, HiDEC is trained to use hierarchical path information from a root to each leaf in a sub-hierarchy composed of the labels of a target document via an attention mechanism and hierarchy-aware masking. HiDEC achieved state-of-the-art performance with significantly fewer model parameters than existing models on benchmark datasets, such as RCV1-v2, NYT, and EURLEX57K.

Downloads

Published

2023-06-26

How to Cite

Im, S., Kim, G., Oh, H.-S., Jo, S., & Kim, D. H. (2023). Hierarchical Text Classification as Sub-hierarchy Sequence Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 37(11), 12933-12941. https://doi.org/10.1609/aaai.v37i11.26520

Issue

Section

AAAI Technical Track on Speech & Natural Language Processing