Li, Y., Pan, Y., Yao, T., Chen, J., & Mei, T. (2021). Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network. Proceedings of the AAAI Conference on Artificial Intelligence, 35(10), 8518-8526. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/17034