From Representation to Reasoning: Toward General-Purpose Visual Intelligence

Authors

  • Chen Wei Rice University

DOI:

https://doi.org/10.1609/aaai.v40i47.41357

Abstract

This talk surveys my research agenda on advancing general-purpose visual intelligence, moving AI beyond static recognition toward active reasoning and embodied action. A central challenge is enabling AI systems to generalize reliably in low-data and long-tail regimes. I address this by combining multimodal representation learning with agentic reasoning frameworks such as PyVision, which equips vision models to dynamically generate tools for deliberate problem-solving, and ViGaL, which leverages gameplay to instill transferable cognitive skills for reasoning under scarcity. These efforts chart a trajectory from representation and generation to interactive, embodied agents, re-imagining AI as an active collaborator capable of tool use, imagination, and purposeful engagement across both digital and physical environments.

Downloads

Published

2026-03-14

How to Cite

Wei, C. (2026). From Representation to Reasoning: Toward General-Purpose Visual Intelligence. Proceedings of the AAAI Conference on Artificial Intelligence, 40(47), 39836–39837. https://doi.org/10.1609/aaai.v40i47.41357