Cross-Sentence Gloss Consistency for Continuous Sign Language Recognition

Authors

  • Qi Rao University of Technology Sydney
  • Ke Sun Alibaba
  • Xiaohan Wang Stanford University
  • Qi Wang Alibaba
  • Bang Zhang Alibaba

DOI:

https://doi.org/10.1609/aaai.v38i5.28265

Keywords:

CV: Video Understanding & Activity Analysis

Abstract

Continuous sign language recognition (CSLR) aims to recognize gloss sequences from continuous sign videos. Recent works enhance the gloss representation consistency by mining correlations between visual and contextual modules within individual sentences. However, there still remain much richer correlations among glosses across different sentences. In this paper, we present a simple yet effective Cross-Sentence Gloss Consistency (CSGC), which enforces glosses belonging to a same category to be more consistent in representation than those belonging to different categories, across all training sentences. Specifically, in CSGC, a prototype is maintained for each gloss category and benefits the gloss discrimination in a contrastive way. Thanks to the well-distinguished gloss prototype, an auxiliary similarity classifier is devised to enhance the recognition clues, thus yielding more accurate results. Extensive experiments conducted on three CSLR datasets show that our proposed CSGC significantly boosts the performance of CSLR, surpassing existing state-of-the-art works by large margins (i.e., 1.6% on PHOENIX14, 2.4% on PHOENIX14-T, and 5.7% on CSL-Daily).

Published

2024-03-24

How to Cite

Rao, Q., Sun, K., Wang, X., Wang, Q., & Zhang, B. (2024). Cross-Sentence Gloss Consistency for Continuous Sign Language Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 4650–4658. https://doi.org/10.1609/aaai.v38i5.28265

Issue

Section

AAAI Technical Track on Computer Vision IV