Zhang, Y., Shen, G., Ning, K., Ren, T., Qiu, X., Wang, M., & Kong, X. (2026). Improving Region Representation Learning from Urban Imagery with Noisy Long-Caption Supervision. Proceedings of the AAAI Conference on Artificial Intelligence, 40(19), 16397–16405. https://doi.org/10.1609/aaai.v40i19.38678