WingBeats and Snapshots: Fusing Sound and Vision for Mosquito Monitoring (Student Abstract)

Authors

  • Ahana Chanda Trustworthy BiometraVision Lab, IISER Bhopal
  • Akshay Agarwal Trustworthy BiometraVision Lab, IISER Bhopal

DOI:

https://doi.org/10.1609/aaai.v40i48.42196

Abstract

Accurate identification of mosquito species is crucial for controlling vector-borne diseases, yet visual or acoustic methods alone are often insufficient. We propose a multimodal deep-learning framework that combines high-resolution images with wingbeat audio using a SwinV2 vision transformer and an Audio Spectrogram Transformer, thereby capturing complementary cues. On a six-species dataset, it achieves 97% accuracy, comparable to the best single-modality baseline, and is designed to improve robustness under noise or environmental variation, demonstrating the value of integrating multiple data sources for reliable mosquito surveillance.

Published

2026-03-14

How to Cite

Chanda, A., & Agarwal, A. (2026). WingBeats and Snapshots: Fusing Sound and Vision for Mosquito Monitoring (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41154–41156. https://doi.org/10.1609/aaai.v40i48.42196