Pakistani Word-level Sign Language Recognition Based on Deep Spatiotemporal Network

Authors

  • Shehryar Naeem Heriot-Watt University Dubai
  • Hanan Salam New York University Abu Dhabi
  • Md Azher Uddin Heriot-Watt University Dubai

DOI:

https://doi.org/10.1609/aaaiss.v6i1.36042

Abstract

Sign language is crucial for the Deaf and Hard-of-Hearing community because it facilitates visual movement-based communication. Nevertheless, most are not familiar with it, rendering interactions with the hearing impaired complicated. While there has been significant work on languages, for instance, American and Chinese Sign Language, Pakistani Sign Language (PSL) at the word level has received less attention and has been studied based on static images. To address this, we introduce a deep spatiotemporal network for word-level PSL recognition from video. It commences by employing top-k frame extraction to enhance processing efficiency. Second, the ResNet-101 model is utilized for extracting deep spatial features from each frame. Subsequently, we introduce the Adaptive Motion Binary Pattern (AMBP), a new spatiotemporal feature descriptor that effectively extracts the spatiotemporal features. These spatial and spatiotemporal are fused and input into the transformer model that processes these representations for better recognition. Experimental evaluations confirm that our framework achieves state-of-the-art results.

Downloads

Published

2025-08-01

How to Cite

Naeem, S., Salam, H., & Uddin, M. A. (2025). Pakistani Word-level Sign Language Recognition Based on Deep Spatiotemporal Network. Proceedings of the AAAI Symposium Series, 6(1), 119–126. https://doi.org/10.1609/aaaiss.v6i1.36042

Issue

Section

Context-Awareness in Cyber-Physical Systems