Unlocking Multi-Modal Potentials for Link Prediction on Dynamic Text-Attributed Graphs

Yuanyuan Xu; Wenjie Zhang; Ying Zhang; Xuemin Lin; Xiwei Xu

doi:10.1609/aaai.v40i32.39956

Authors

Yuanyuan Xu University of New South Wales
Wenjie Zhang University of New South Wales
Ying Zhang Zhejiang Gongshang University
Xuemin Lin Shanghai Jiao Tong University
Xiwei Xu CSIRO

DOI:

https://doi.org/10.1609/aaai.v40i32.39956

Abstract

Dynamic Text-Attributed Graphs (DyTAGs) are a novel graph paradigm that captures evolving temporal events (edges) alongside rich textual attributes. Existing studies can be broadly categorized into TGNN-driven and LLM-driven approaches, both of which encode textual attributes and temporal structures for DyTAG representation. We observe that DyTAGs inherently comprise three distinct modalities: temporal, textual, and structural, often exhibiting completely disjoint distributions. However, the first two modalities are largely overlooked by existing studies, leading to suboptimal performance. To address this, we propose MoMent, a multi-modal network that explicitly models, integrates, and aligns each modality to learn node representations for link prediction. Given the disjoint nature of the original modality distributions, we first construct modality-specific features and encode them using individual encoders to capture correlations across temporal patterns, semantic context, and local structures. Each encoder generates modality-specific tokens, which are then fused into comprehensive node representations with a theoretical guarantee. To avoid disjoint subspaces of these heterogeneous modalities, we propose a dual-domain alignment loss that first aligns their distributions globally and then fine-tunes coherence at the instance level. This enhances coherent representations from temporal, textual, and structural views. Extensive experiments across seven datasets show that MoMent achieves up to 17.28% accuracy improvement and up to 31x speed-up against eight baselines.

Unlocking Multi-Modal Potentials for Link Prediction on Dynamic Text-Attributed Graphs

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information