[1]

J. Yu, “Activating Visual Context and Commonsense Reasoning Through Masked Prediction in VLMs”, AAAI, vol. 40, no. 33, pp. 27952–27960, Mar. 2026.