Deep Multi-modal Graph Clustering via Graph Transformer Network

Authors

  • Qianqian Wang Xidian University Anhui University
  • Haiming Xu Xidian University
  • Zihao Zhang Xidian University
  • Wei Feng Xi'an Jiaotong University
  • Quanxue Gao Xidian University

DOI:

https://doi.org/10.1609/aaai.v39i8.32844

Abstract

Current deep multi-modal graph clustering methods primarily rely on Graph Neural Network (GNN) to fully exploit attribute features and graph structures, including message propagation and low-dimensional feature embedding. However, these methods lack further exploration of graph structural information, such as the relationship between nodes and shortest paths. Additionally, they may not sufficiently mine complementary information among multi-modal graph data. To address these issues, we propose a novel Deep Multi-modal Graph Clustering via Graph Transformer Network method, called DMGC-GTN. This method thoroughly dissects and utilizes graph structural information, applying graph smoothing to node features and incorporating various forms of embeddings into the transformer architecture. This achieves a unified embedding of graph structure and multi-modal feature attributes, fully exploiting the complementary information within multi-modal graph data. Extensive experiments demonstrate the effectiveness of our algorithm.

Downloads

Published

2025-04-11

How to Cite

Wang, Q., Xu, H., Zhang, Z., Feng, W., & Gao, Q. (2025). Deep Multi-modal Graph Clustering via Graph Transformer Network. Proceedings of the AAAI Conference on Artificial Intelligence, 39(8), 7835–7843. https://doi.org/10.1609/aaai.v39i8.32844

Issue

Section

AAAI Technical Track on Computer Vision VII