Sun, Zhongkai, Prathusha Sarma, William Sethares, and Yingyu Liang. “Learning Relationships Between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis”. Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 8992-8999. Accessed April 17, 2024. https://ojs.aaai.org/index.php/AAAI/article/view/6431.