Compose with Me: Collaborative Music Inpainter for Symbolic Music Infilling

Zhejing Hu; Yan Liu; Gong Chen; Bruce X.B. Yu

doi:10.1609/aaai.v39i2.32122

Authors

Zhejing Hu The Hong Kong Polytechnic University
Yan Liu The Hong Kong Polytechnic University
Gong Chen The Hong Kong Polytechnic University
Bruce X.B. Yu Zhejiang University-University of Illinois Urbana-Champaign Institute

DOI:

https://doi.org/10.1609/aaai.v39i2.32122

Abstract

The field of music generation has seen a surge of interest from both academia and industry, with innovative platforms such as Suno, Udio, and SkyMusic earning widespread recognition. However, the challenge of music infilling—modifying specific music segments without reconstructing the entire piece—remains a significant hurdle for both audio-based and symbolic-based models, limiting their adaptability and practicality. In this paper, we address symbolic music infilling by introducing the Collaborative Music Inpainter (CMI), an advanced human-in-the-loop (HITL) model for music infilling. The CMI features the Joint Embedding Predictive Autoregressive Generative Architecture (JEP-AGA), which learns the high-level predictive representations of the masked part that needs to be infilled during the autoregressive generative process, akin to how humans perceive and interpret music. The newly developed Dynamic Interaction Learner (DIL) achieves HITL by iteratively refining the infilled output based on user interactions alone, significantly reducing the interaction cost without requiring further input. Experimental results confirm CMI’s superior performance in music infilling, demonstrating its efficiency in producing high-quality music.

Compose with Me: Collaborative Music Inpainter for Symbolic Music Infilling

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information