Learning to Communicate and Solve Visual Blocks-World Tasks


  • Qi Zhang University of Michigan
  • Richard Lewis University of Michigan
  • Satinder Singh University of Michigan
  • Edmund Durfee University of Michigan




We study emergent communication between speaker and listener recurrent neural-network agents that are tasked to cooperatively construct a blocks-world target image sampled from a generative grammar of blocks configurations. The speaker receives the target image and learns to emit a sequence of discrete symbols from a fixed vocabulary. The listener learns to construct a blocks-world image by choosing block placement actions as a function of the speaker’s full utterance and the image of the ongoing construction. Our contributions are (a) the introduction of a task domain for studying emergent communication that is both challenging and affords useful analyses of the emergent protocols; (b) an empirical comparison of the interpolation and extrapolation performance of training via supervised, (contextual) Bandit, and reinforcement learning; and (c) evidence for the emergence of interesting linguistic properties in the RL agent protocol that are distinct from the other two.




Zhang, Q., Lewis, R., Singh, S., & Durfee, E. (2019). Learning to Communicate and Solve Visual Blocks-World Tasks. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 5781-5788. https://doi.org/10.1609/aaai.v33i01.33015781



AAAI Technical Track: Machine Learning