[1]
S. Gao, Z. Chen, G. Chen, W. Wang, and T. Lu, “AVSegFormer: Audio-Visual Segmentation with Transformer”, AAAI, vol. 38, no. 11, pp. 12155-12163, Mar. 2024.