MyGram: Modality-aware Graph Transformer with Global Distribution for Multi-modal Entity Alignment

Zhifei Li; Ziyue Qin; Xiangyu Luo; Xiaoju Hou; Yue Zhao; Miao Zhang; Zhifang Huang; Kui Xiao; Bing Yang

doi:10.1609/aaai.v40i23.39003

Authors

Zhifei Li School of Computer Science, Hubei University, Wuhan 430062, China Hubei Key Laboratory of Big Data Intelligent Analysis and Application (Hubei University), Wuhan 430062, China Key Laboratory of Intelligent Sensing System and Security (Hubei University), Ministry of Education, Wuhan 430062, China
Ziyue Qin School of Computer Science, Hubei University, Wuhan 430062, China
Xiangyu Luo School of Cyber Science and Technology, Hubei University, Wuhan 430062, China
Xiaoju Hou Institute of Vocational Education, Guangdong Industry Polytechnic University, Guangzhou 510300, China
Yue Zhao Shandong Police College, Ji’nan 250200, China
Miao Zhang School of Computer Science, Hubei University, Wuhan 430062, China
Zhifang Huang School of Computer Science, Hubei University, Wuhan 430062, China
Kui Xiao School of Computer Science, Hubei University, Wuhan 430062, China
Bing Yang School of Computer Science, Hubei University, Wuhan 430062, China

DOI:

https://doi.org/10.1609/aaai.v40i23.39003

Abstract

Multi-modal entity alignment aims to identify equivalent entities between two multi-modal Knowledge graphs by integrating multi-modal data, such as images and text, to enrich the semantic representations of entities. However, existing methods may overlook the structural contextual information within each modality, making them vulnerable to interference from shallow features. To address these challenges, we propose MyGram, a \textbf{m}odalit\textbf{y}-aware \textbf{gra}ph transformer with global distribution for \textbf{m}ulti-modal entity alignment. Specifically, we develop a modality diffusion learning module to capture deep structural contextual information within modalities and enable fine-grained multi-modal fusion. In addition, we introduce a Gram Loss that acts as a regularization constraint by minimizing the volume of a 4-dimensional parallelotope formed by multi-modal features, thereby achieving global distribution consistency across modalities. We conduct experiments on five public datasets. Results show that MyGram outperforms baseline models, achieving a maximum improvement of 4.8\% in Hits@1 on FBDB15K, 9.9\% on FBYG15K, and 4.3\% on DBP15K.

MyGram: Modality-aware Graph Transformer with Global Distribution for Multi-modal Entity Alignment

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information