Wang, D., and D. Xiong. “Efficient Object-Level Visual Context Modeling for Multimodal Machine Translation: Masking Irrelevant Objects Helps Grounding”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 4, May 2021, pp. 2720-8, https://ojs.aaai.org/index.php/AAAI/article/view/16376.