[1]

L. Xu, Y. Gao, W. Song, and A. Hao, “Weakly Supervised Multimodal Affordance Grounding for Egocentric Images”, AAAI, vol. 38, no. 6, pp. 6324–6332, Mar. 2024.