From Representation to Reasoning: Toward General-Purpose Visual Intelligence
DOI:
https://doi.org/10.1609/aaai.v40i47.41357Abstract
This talk surveys my research agenda on advancing general-purpose visual intelligence, moving AI beyond static recognition toward active reasoning and embodied action. A central challenge is enabling AI systems to generalize reliably in low-data and long-tail regimes. I address this by combining multimodal representation learning with agentic reasoning frameworks such as PyVision, which equips vision models to dynamically generate tools for deliberate problem-solving, and ViGaL, which leverages gameplay to instill transferable cognitive skills for reasoning under scarcity. These efforts chart a trajectory from representation and generation to interactive, embodied agents, re-imagining AI as an active collaborator capable of tool use, imagination, and purposeful engagement across both digital and physical environments.Downloads
Published
2026-03-14
How to Cite
Wei, C. (2026). From Representation to Reasoning: Toward General-Purpose Visual Intelligence. Proceedings of the AAAI Conference on Artificial Intelligence, 40(47), 39836–39837. https://doi.org/10.1609/aaai.v40i47.41357
Issue
Section
New Faculty Highlights