Li, Yehao, et al. “Scheduled Sampling in Vision-Language Pretraining With Decoupled Encoder-Decoder Network”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 10, May 2021, pp. 8518-26, doi:10.1609/aaai.v35i10.17034.