Multi-Modal Hand-to-Mouth Gesture Recognition in Activity-Oriented RGB-Thermal Footage (Student Abstract)

Glenn Fernandes; Meixi Lu; Farzad Shahabi; Jiayi Zheng; Aggelos Katsaggelos; Nabil Alshurafa

doi:10.1609/aaai.v39i28.35254

Multi-Modal Hand-to-Mouth Gesture Recognition in Activity-Oriented RGB-Thermal Footage (Student Abstract)

Authors

Glenn Fernandes Northwestern University, Chicago, IL
Meixi Lu Northwestern University, Chicago, IL
Farzad Shahabi Northwestern University, Chicago, IL
Jiayi Zheng Northwestern University, Chicago, IL
Aggelos Katsaggelos Northwestern University, Chicago, IL
Nabil Alshurafa Northwestern University, Chicago, IL

DOI:

https://doi.org/10.1609/aaai.v39i28.35254

Abstract

Health-risk behaviors such as overeating and smoking have a profound impact on public health, making their monitoring and mitigation critical. Wearable RGB-Thermal cameras are being employed to monitor these behaviors by capturing hand-to-mouth (HTM) gestures, which are central to them. However, detection models relying on single modalities—either RGB or thermal—often struggle to accurately distinguish these confounding gestures due to inherent sensor limitations, such as sensitivity to lighting conditions or thermal occlusions. We present a family of fusion models that integrate RGB and thermal video data using early-, decision- , and a novel mid-fusion architecture, RGB-Thermal Fusion Video Network (RTFVNet), designed to enhance the recognition of HTM gestures associated with eating and smoking. Our evaluation shows that while decision fusion achieves the highest F1-score of 88% (0.44 TFLOPs), RTFVNet offers an optimal balance between performance (85%) and complexity (0.37 TFLOPs) for gesture classification of eating, smoking, and non-gesture activities.

AAAI-25 / IAAI-25 / EAAI-25 Proceedings Cover

Downloads

Published

2025-04-11

How to Cite

Fernandes, G., Lu, M., Shahabi, F., Zheng, J., Katsaggelos, A., & Alshurafa, N. (2025). Multi-Modal Hand-to-Mouth Gesture Recognition in Activity-Oriented RGB-Thermal Footage (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 39(28), 29368–29370. https://doi.org/10.1609/aaai.v39i28.35254

Download Citation

Issue

Vol. 39 No. 28: IAAI-25, EAAI-25, AAAI-25 Student Abstracts, Undergraduate Consortium and Demonstrations

Section

AAAI Student Abstract and Poster Program

Multi-Modal Hand-to-Mouth Gesture Recognition in Activity-Oriented RGB-Thermal Footage (Student Abstract)

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information