Open-World Multimodal Understanding and Generation with Efficiently Finetuned Foundation Models

Long Chen

doi:10.1609/aaai.v39i27.35101

Open-World Multimodal Understanding and Generation with Efficiently Finetuned Foundation Models

Authors

Long Chen The Hong Kong University of Science and Technology, Hong Kong

DOI:

https://doi.org/10.1609/aaai.v39i27.35101

Abstract

With the astonishing ability of different pretrained foundation models (e.g., large language models (LLMs), vision-language models, diffusion models), today’s AI research and development tendency has been revolutionized. In this talk, I will answer two questions: Q1: How can we efficiently train or fine-tune foundation models? Q2: How can we build strong open-world multimodal understanding and generation models with these pretrained foundation models?

AAAI-25 / IAAI-25 / EAAI-25 Proceedings Cover

Downloads

Published

2025-04-11

How to Cite

Chen, L. (2025). Open-World Multimodal Understanding and Generation with Efficiently Finetuned Foundation Models. Proceedings of the AAAI Conference on Artificial Intelligence, 39(27), 28706–28706. https://doi.org/10.1609/aaai.v39i27.35101

Download Citation

Issue

Vol. 39 No. 27: AAAI-25 Special Track on AI for Social Impact, Senior Member Presentations, New Faculty Highlights, Journal Track

Section

New Faculty Highlights

Open-World Multimodal Understanding and Generation with Efficiently Finetuned Foundation Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information