[1]
Y. Li, Y. Pan, T. Yao, J. Chen, and T. Mei, “Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network”, AAAI, vol. 35, no. 10, pp. 8518-8526, May 2021.