Wang, D., & Xiong, D. (2021). Efficient Object-Level Visual Context Modeling for Multimodal Machine Translation: Masking Irrelevant Objects Helps Grounding. Proceedings of the AAAI Conference on Artificial Intelligence, 35(4), 2720-2728. https://doi.org/10.1609/aaai.v35i4.16376