Gao, S. (2024) “AVSegFormer: Audio-Visual Segmentation with Transformer”, Proceedings of the AAAI Conference on Artificial Intelligence, 38(11), pp. 12155–12163. doi: 10.1609/aaai.v38i11.29104.