Yu, Jiaao, et al. “Activating Visual Context and Commonsense Reasoning Through Masked Prediction in VLMs”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 33, Mar. 2026, pp. 27952-60, doi:10.1609/aaai.v40i33.40019.