[1]

L. Song, J. Liu, B. Qian, and Y. Chen, “Connecting Language to Images: A Progressive Attention-Guided Network for Simultaneous Image Captioning and Language Grounding”, AAAI, vol. 33, no. 01, pp. 8885-8892, Jul. 2019.