Zhang, W., Ying, Y., Lu, P., & Zha, H. (2020). Learning Long- and Short-Term User Literal-Preference with Multimodal Hierarchical Transformer Network for Personalized Image Caption. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 9571-9578. https://doi.org/10.1609/aaai.v34i05.6503