[1]
Y. Zhang, “Improving Region Representation Learning from Urban Imagery with Noisy Long-Caption Supervision”, AAAI, vol. 40, no. 19, pp. 16397–16405, Mar. 2026.