Hand-Model-Aware Sign Language Recognition

Authors

  • Hezhen Hu CAS Key Laboratory of GIPAS, EEIS Department, University of Science and Technology of China
  • Wengang Zhou CAS Key Laboratory of GIPAS, EEIS Department, University of Science and Technology of China Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
  • Houqiang Li CAS Key Laboratory of GIPAS, EEIS Department, University of Science and Technology of China Institute of Artificial Intelligence, Hefei Comprehensive National Science Center

Keywords:

Language and Vision

Abstract

Hand gestures play a dominant role in the expression of sign language. Current deep-learning based video sign language recognition (SLR) methods usually follow a data-driven paradigm under the supervision of the category label. However, those methods suffer limited interpretability and may encounter the overfitting issue due to limited sign data sources. In this paper, we introduce the hand prior and propose a new hand-model-aware framework for isolated SLR with the modeling hand as the intermediate representation. We first transform the cropped hand sequence into the latent semantic feature. Then the hand model introduces the hand prior and provides a mapping from the semantic feature to the compact hand pose representation. Finally, the inference module enhances the spatio-temporal pose representation and performs the final recognition. Due to the lack of annotation on the hand pose under current sign language datasets, we further guide its learning by utilizing multiple weakly-supervised losses to constrain its spatial and temporal consistency. To validate the effectiveness of our method, we perform extensive experiments on four benchmark datasets, including NMFs-CSL, SLR500, MSASL and WLASL. Experimental results demonstrate that our method achieves state-of-the-art performance on all four popular benchmarks with a notable margin.

Downloads

Published

2021-05-18

How to Cite

Hu, H., Zhou, W., & Li, H. (2021). Hand-Model-Aware Sign Language Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 35(2), 1558-1566. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/16247

Issue

Section

AAAI Technical Track on Computer Vision I