[1]
Y. Zhang, “Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation”, AAAI, vol. 39, no. 12, pp. 13322–13330, Apr. 2025.