Peng, D., Liu, C., Liu, Y., & Jin, L. (2024). ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining. Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 4468-4477. https://doi.org/10.1609/aaai.v38i5.28245