Wang, Dexin, and Deyi Xiong. 2021. “Efficient Object-Level Visual Context Modeling for Multimodal Machine Translation: Masking Irrelevant Objects Helps Grounding”. Proceedings of the AAAI Conference on Artificial Intelligence 35 (4):2720-28. https://doi.org/10.1609/aaai.v35i4.16376.