Text-DIAE: A Self-Supervised Degradation Invariant Autoencoder for Text Recognition and Document Enhancement

Authors

  • Mohamed Ali Souibgui Computer Vision Center, Universitat Autònoma de Barcelona, Spain
  • Sanket Biswas Computer Vision Center, Universitat Autònoma de Barcelona, Spain
  • Andres Mafla Computer Vision Center, Universitat Autònoma de Barcelona, Spain
  • Ali Furkan Biten Computer Vision Center, Universitat Autònoma de Barcelona, Spain
  • Alicia Fornés Computer Vision Center, Universitat Autònoma de Barcelona, Spain
  • Yousri Kessentini Digital Research Center of Sfax, SM@RTS Laboratory, Sfax, Tunisia
  • Josep Lladós Computer Vision Center, Universitat Autònoma de Barcelona, Spain
  • Lluis Gomez Computer Vision Center, Universitat Autònoma de Barcelona, Spain
  • Dimosthenis Karatzas Computer Vision Center, Universitat Autònoma de Barcelona, Spain

DOI:

https://doi.org/10.1609/aaai.v37i2.25328

Keywords:

CV: Representation Learning for Vision, CV: Applications, CV: Language and Vision, ML: Unsupervised & Self-Supervised Learning

Abstract

In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a transformer-based architecture that incorporates three pretext tasks as learning objectives to be optimized during pre-training without the usage of labelled data. Each of the pretext objectives is specifically tailored for the final downstream tasks. We conduct several ablation experiments that confirm the design choice of the selected pretext tasks. Importantly, the proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time requiring substantially fewer data samples to converge. Finally, we demonstrate that our method surpasses the state-of-the-art in existing supervised and self-supervised settings in handwritten and scene text recognition and document image enhancement. Our code and trained models will be made publicly available at https://github.com/dali92002/SSL-OCR

Downloads

Published

2023-06-26

How to Cite

Souibgui, M. A., Biswas, S., Mafla, A., Biten, A. F., Fornés, A., Kessentini, Y., Lladós, J., Gomez, L., & Karatzas, D. (2023). Text-DIAE: A Self-Supervised Degradation Invariant Autoencoder for Text Recognition and Document Enhancement. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2), 2330-2338. https://doi.org/10.1609/aaai.v37i2.25328

Issue

Section

AAAI Technical Track on Computer Vision II