[1]
Y. Tian, T. Ma, L. Xie, and Q. Ye, “ChatterBox: Multimodal Referring and Grounding with Chain-of-Questions”, AAAI, vol. 39, no. 7, pp. 7401–7409, Apr. 2025.