(1)
Geng, S.; Gao, P.; Chatterjee, M.; Hori, C.; Le Roux, J.; Zhang, Y.; Li, H.; Cherian, A. Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers. AAAI 2021, 35, 1415-1423.