Wang, Z., Bao, R., Wu, Q., & Liu, S. (2021). Confidence-aware Non-repetitive Multimodal Transformers for TextCaps. Proceedings of the AAAI Conference on Artificial Intelligence, 35(4), 2835-2843. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/16389