Advancing Sign Language Recognition: A YOLO v.11-Based Deep Learning Framework for Alphabet and Transactional Hand Gesture Detection

Abdelrahman T. Elgohr; Mohamed S. Elhadidy; Marwa El-geneedy; Shimaa Akram; Mahmoud A. A. Mousa

doi:10.1609/aaaiss.v6i1.36055

Authors

Abdelrahman T. Elgohr Department of Mechatronics Engineering, Faculty of Engineering, Horus University, New Damietta 34517, Egypt
Mohamed S. Elhadidy Department of Mechatronics Engineering, Faculty of Engineering, Horus University, New Damietta 34517, Egypt
Marwa El-geneedy Department of Mechatronics Engineering, Faculty of Engineering, Horus University, New Damietta 34517, Egypt
Shimaa Akram Communications and Electronics Engineering Dept., Faculty of Engineering, Horus University Egypt, New Damietta, Egypt
Mahmoud A. A. Mousa School of Mathematical and Computer Sciences, Heriot Watt University, Dubai, UAE

DOI:

https://doi.org/10.1609/aaaiss.v6i1.36055

Abstract

Sign language recognition is an essential tool that facilitates communication for those with hearing and speech disabilities. Conventional recognition techniques frequently encounter challenges in real-time performance, resilience, and accuracy owing to fluctuations in hand positions, backdrops, and lighting conditions. This paper presents a YOLOv11-based deep learning system for recognising American Sign Language (ASL), concentrating on both alphabetic and transactional hand motions to mitigate existing constraints. The model is engineered to function in real-time while ensuring high precision and resilience across varied contexts. The methodology adheres to a systematic pipeline, commencing with dataset gathering and pre-processing, which include image augmentation, normalisation, and scaling to guarantee model generalisation. The YOLOv11 architecture utilises an improved backbone, neck, and detecting head for effective feature extraction and classification. Training is enhanced by the utilisation of the AdamW optimiser, a meticulously adjusted learning rate, and a loss function that integrates box loss, classification loss, and distribution focal loss (DFL). Performance is assessed using precision, recall, mean Average Precision (mAP), and inference rate to guarantee the model's accuracy and efficiency. Experimental findings indicate that the suggested model attains 95.4% precision, 94.8% recall, and 98.1% mean Average Precision (mAP), markedly surpassing conventional methods. The amalgamation of GRAD-CAM with occlusion sensitivity significantly improves model interpretability. This research offers a robust and scalable approach for real-time sign language detection, facilitating enhanced accessibility in communication technologies, assistive devices, and interactive systems.

Advancing Sign Language Recognition: A YOLO v.11-Based Deep Learning Framework for Alphabet and Transactional Hand Gesture Detection

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information