GloCTM: Cross-Lingual Topic Modeling via a Global Context Space

Authors

  • Nguyen Tien Phat Hanoi University of Science and Technology
  • Ngo Vu Minh Hanoi University of Science and Technology
  • Linh Ngo Van Hanoi University of Science and Technology
  • Nguyen Thi Ngoc Diep Vietnam National University Hanoi
  • Thien Huu Nguyen University of Oregon

DOI:

https://doi.org/10.1609/aaai.v40i39.40549

Abstract

Cross-lingual topic modeling seeks to uncover coherent and semantically aligned topics across languages—a task central to multilingual understanding. Yet most existing models learn topics in disjoint, language-specific spaces and rely on alignment mechanisms (e.g., bilingual dictionaries) that often fail to capture deep cross-lingual semantics, resulting in loosely connected topic spaces. Moreover, these approaches often overlook the rich semantic signals embedded in multilingual pretrained representations, further limiting their ability to capture fine-grained alignment. We introduce **GloCTM** (**Glo**bal Context Space for **C**ross-Lingual **T**opic **M**odel), a novel framework that enforces cross-lingual topic alignment through a unified semantic space spanning the entire model pipeline. GloCTM constructs enriched input representations by expanding bag-of-words with cross-lingual lexical neighborhoods, and infers topic proportions using both local and global encoders, with their latent representations aligned through internal regularization. At the output level, the global topic-word distribution, defined over the combined vocabulary, structurally synchronizes topic meanings across languages. To further ground topics in deep semantic space, GloCTM incorporates a Centered Kernel Alignment (CKA) loss that aligns the latent topic space with multilingual contextual embeddings. Experiments across multiple benchmarks demonstrate that GloCTM significantly improves topic coherence and cross-lingual alignment, outperforming strong baselines.

Downloads

Published

2026-03-14

How to Cite

Phat, N. T., Minh, N. V., Van, L. N., Diep, N. T. N., & Nguyen, T. H. (2026). GloCTM: Cross-Lingual Topic Modeling via a Global Context Space. Proceedings of the AAAI Conference on Artificial Intelligence, 40(39), 32710–32718. https://doi.org/10.1609/aaai.v40i39.40549

Issue

Section

AAAI Technical Track on Natural Language Processing IV