CATCH: A Controllable Theme Detection Framework with Contextualized Clustering and Hierarchical Generation

Authors

  • Rui Ke Shenzhen Research Institute of Big Data, The School of Data Science, The Chinese University of Hong Kong, Shenzhen
  • Jiahui Xu Shenzhen Research Institute of Big Data, The School of Data Science, The Chinese University of Hong Kong, Shenzhen
  • Shenghao Yang Shenzhen Research Institute of Big Data, The School of Data Science, The Chinese University of Hong Kong, Shenzhen
  • Kuang Wang Shenzhen Research Institute of Big Data, The School of Data Science, The Chinese University of Hong Kong, Shenzhen
  • Feng Jiang Artificial Intelligence Research Institute, Shenzhen University of Advanced Technology
  • Haizhou Li Shenzhen Research Institute of Big Data, The School of Data Science, The Chinese University of Hong Kong, Shenzhen The School of Artificial Intelligence, The Chinese University of Hong Kong, Shenzhen Department of Electrical and Computer Engineering, National University of Singapore

DOI:

https://doi.org/10.1609/aaai.v40i37.40406

Abstract

Theme detection is a fundamental task in user-centric dialogue systems, aiming to identify the latent topic of each utterance without relying on predefined schemas. Unlike intent induction, which operates within fixed label spaces, theme detection requires cross-dialogue consistency and alignment with personalized user preferences, posing significant challenges. Existing methods often struggle with sparse, short utterances for accurate topic representation and fail to capture user-level thematic preferences across dialogues. To address these challenges, we propose CATCH (Controllable Theme Detection with Contextualized Clustering and Hierarchical Generation), a unified framework that integrates three core components: (1) context-aware topic representation, which enriches utterance-level semantics using surrounding topic segments; (2) preference-guided topic clustering, which jointly models semantic proximity and personalized feedback to align themes across dialogue; and (3) a hierarchical theme generation mechanism designed to suppress noise and produce robust, coherent topic labels. Experiments on a multi-domain customer dialogue benchmark (DSTC-12) demonstrate the effectiveness of CATCH with 8B LLM in both theme clustering and topic generation quality.

Published

2026-03-14

How to Cite

Ke, R., Xu, J., Yang, S., Wang, K., Jiang, F., & Li, H. (2026). CATCH: A Controllable Theme Detection Framework with Contextualized Clustering and Hierarchical Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(37), 31419–31428. https://doi.org/10.1609/aaai.v40i37.40406

Issue

Section

AAAI Technical Track on Natural Language Processing II