Augmenting Human Creativity with Machine Learning

Hao-Wen Dong

doi:10.1609/aaai.v40i47.41344

Authors

Hao-Wen Dong University of Michigan

DOI:

https://doi.org/10.1609/aaai.v40i47.41344

Abstract

In this talk, I will survey my work in three main research directions: 1) generative models for music creation, 2) AI-assisted music creation tools, and 3) multimodal generative models for content creation. In particular, I will discuss our recent work on AI-assisted video editing that explores novel machine learning models that can cut, select, and rearrange a long video into a short video. In the first TeaserGen project, we proposed a narration-centered teaser generation system that can effectively compress >30-min documentaries into <3-min teasers leveraging pretrained LLMs and language-vision models. In the second REGen project, we proposed a retrieval-embedded generation framework that allows an LLM to quote multimodal resources while maintaining a coherent narrative. I will conclude by discussing our future work towards next-generation video editing interfaces using multimodal LLMs and retrieval embedded generation. I will also discuss our future work towards playful human-AI music co-creation systems where the user can control a music generation system through hand gestures and body movements.

Augmenting Human Creativity with Machine Learning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information