[1]
W. Zhang, Y. Ying, P. Lu, and H. Zha, “Learning Long- and Short-Term User Literal-Preference with Multimodal Hierarchical Transformer Network for Personalized Image Caption”, AAAI, vol. 34, no. 05, pp. 9571-9578, Apr. 2020.