Xu, X., C. Wu, S. Rosenman, V. Lal, W. Che, and N. Duan. “BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 9, June 2023, pp. 10637-4, doi:10.1609/aaai.v37i9.26263.