Modeling Temporal Tonal Relations in Polyphonic Music Through Deep Networks With a Novel Image-Based Representation


  • Ching-Hua Chuan University of North Florida; University of Miami
  • Dorien Herremans Singapore University of Technology and Design; Institute of High Performance Computing, A*STAR



music, knowledge representation, CNN, LSTM, autoencoder


We propose an end-to-end approach for modeling polyphonic music with a novel graphical representation, based on music theory, in a deep neural network. Despite the success of deep learning in various applications, it remains a challenge to incorporate existing domain knowledge in a network without affecting its training routines. In this paper we present a novel approach for predictive music modeling and music generation that incorporates domain knowledge in its representation. In this work, music is transformed into a 2D representation, inspired by tonnetz from music theory, which graphically encodes musical relationships between pitches. This representation is incorporated in a deep network structure consisting of multilayered convolutional neural networks (CNN, for learning an efficient abstract encoding of the representation) and recurrent neural networks with long short-term memory cells (LSTM, for capturing temporal dependencies in music sequences). We empirically evaluate the nature and the effectiveness of the network by using a dataset of classical music from various composers. We investigate the effect of parameters including the number of convolution feature maps, pooling strategies, and three configurations of the network: LSTM without CNN, LSTM with CNN (pre-trained vs. not pre-trained). Visualizations of the feature maps and filters in the CNN are explored, and a comparison is made between the proposed tonnetz-inspired representation and pianoroll, a commonly used representation of music in computational systems. Experimental results show that the tonnetz representation produces musical sequences that are more tonally stable and contain more repeated patterns than sequences generated by pianoroll-based models, a finding that is directly useful for tackling current challenges in music and AI such as smart music generation.




How to Cite

Chuan, C.-H., & Herremans, D. (2018). Modeling Temporal Tonal Relations in Polyphonic Music Through Deep Networks With a Novel Image-Based Representation. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1).



Main Track: Machine Learning Applications