Efficient Event-Based Semantic Segmentation via Exploiting Frame-Event Fusion: A Hybrid Neural Network Approach

Authors

  • Hebei Li MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China
  • Yansong Peng MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China
  • Jiahui Yuan MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China
  • Peixi Wu MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China
  • Jin Wang MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China
  • Yueyi Zhang MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China
  • Xiaoyan Sun MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China Institute of Artificial Intelligence, Hefei Comprehensive National Science Center

DOI:

https://doi.org/10.1609/aaai.v39i17.34013

Abstract

Event cameras have recently been introduced into image semantic segmentation, owing to their high temporal resolution and other advantageous properties. However, existing event-based semantic segmentation methods often fail to fully exploit the complementary information provided by frames and events, resulting in complex training strategies and increased computational costs. To address these challenges, we propose an efficient hybrid framework for image semantic segmentation, comprising a Spiking Neural Network branch for events and an Artificial Neural Network branch for frames. Specifically, we introduce three specialized modules to facilitate the interaction between these two branches: the Adaptive Temporal Weighting (ATW) Injector, the Event-Driven Sparse (EDS) Injector, and the Channel Selection Fusion (CSF) module. The ATW Injector dynamically integrates temporal features from event data into frame features, enhancing segmentation accuracy by leveraging critical dynamic temporal information. The EDS Injector effectively combines sparse event data with rich frame features, ensuring precise temporal and spatial information alignment. The CSF module selectively merges these features to optimize segmentation performance. Experimental results demonstrate that our framework not only achieves state-of-the-art accuracy across the DDD17-Seg, DSEC-Semantic, and M3ED-Semantic datasets but also significantly reduces energy consumption, achieving a 65% reduction on the DSEC-Semantic dataset.

Downloads

Published

2025-04-11

How to Cite

Li, H., Peng, Y., Yuan, J., Wu, P., Wang, J., Zhang, Y., & Sun, X. (2025). Efficient Event-Based Semantic Segmentation via Exploiting Frame-Event Fusion: A Hybrid Neural Network Approach. Proceedings of the AAAI Conference on Artificial Intelligence, 39(17), 18296–18304. https://doi.org/10.1609/aaai.v39i17.34013

Issue

Section

AAAI Technical Track on Machine Learning III