Gao, S., Chen, Z., Chen, G., Wang, W., & Lu, T. (2024). AVSegFormer: Audio-Visual Segmentation with Transformer. Proceedings of the AAAI Conference on Artificial Intelligence, 38(11), 12155–12163. https://doi.org/10.1609/aaai.v38i11.29104