Pakistani Word-level Sign Language Recognition Based on Deep Spatiotemporal Network

Shehryar Naeem; Hanan Salam; Md Azher Uddin

doi:10.1609/aaaiss.v6i1.36042

Authors

Shehryar Naeem Heriot-Watt University Dubai
Hanan Salam New York University Abu Dhabi
Md Azher Uddin Heriot-Watt University Dubai

DOI:

https://doi.org/10.1609/aaaiss.v6i1.36042

Abstract

Sign language is crucial for the Deaf and Hard-of-Hearing community because it facilitates visual movement-based communication. Nevertheless, most are not familiar with it, rendering interactions with the hearing impaired complicated. While there has been significant work on languages, for instance, American and Chinese Sign Language, Pakistani Sign Language (PSL) at the word level has received less attention and has been studied based on static images. To address this, we introduce a deep spatiotemporal network for word-level PSL recognition from video. It commences by employing top-k frame extraction to enhance processing efficiency. Second, the ResNet-101 model is utilized for extracting deep spatial features from each frame. Subsequently, we introduce the Adaptive Motion Binary Pattern (AMBP), a new spatiotemporal feature descriptor that effectively extracts the spatiotemporal features. These spatial and spatiotemporal are fused and input into the transformer model that processes these representations for better recognition. Experimental evaluations confirm that our framework achieves state-of-the-art results.

Pakistani Word-level Sign Language Recognition Based on Deep Spatiotemporal Network

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information