Song, L., Liu, J., Qian, B., & Chen, Y. (2019). Connecting Language to Images: A Progressive Attention-Guided Network for Simultaneous Image Captioning and Language Grounding. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 8885-8892. https://doi.org/10.1609/aaai.v33i01.33018885