Li, Y., Pan, Y., Yao, T., Chen, J., & Mei, T. (2021). Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network. Proceedings of the AAAI Conference on Artificial Intelligence, 35(10), 8518-8526. https://doi.org/10.1609/aaai.v35i10.17034