Efficient Event-Based Semantic Segmentation via Exploiting Frame-Event Fusion: A Hybrid Neural Network Approach

Hebei Li; Yansong Peng; Jiahui Yuan; Peixi Wu; Jin Wang; Yueyi Zhang; Xiaoyan Sun

doi:10.1609/aaai.v39i17.34013

Authors

Hebei Li MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China
Yansong Peng MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China
Jiahui Yuan MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China
Peixi Wu MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China
Jin Wang MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China
Yueyi Zhang MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China
Xiaoyan Sun MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China Institute of Artificial Intelligence, Hefei Comprehensive National Science Center

DOI:

https://doi.org/10.1609/aaai.v39i17.34013

Abstract

Event cameras have recently been introduced into image semantic segmentation, owing to their high temporal resolution and other advantageous properties. However, existing event-based semantic segmentation methods often fail to fully exploit the complementary information provided by frames and events, resulting in complex training strategies and increased computational costs. To address these challenges, we propose an efficient hybrid framework for image semantic segmentation, comprising a Spiking Neural Network branch for events and an Artificial Neural Network branch for frames. Specifically, we introduce three specialized modules to facilitate the interaction between these two branches: the Adaptive Temporal Weighting (ATW) Injector, the Event-Driven Sparse (EDS) Injector, and the Channel Selection Fusion (CSF) module. The ATW Injector dynamically integrates temporal features from event data into frame features, enhancing segmentation accuracy by leveraging critical dynamic temporal information. The EDS Injector effectively combines sparse event data with rich frame features, ensuring precise temporal and spatial information alignment. The CSF module selectively merges these features to optimize segmentation performance. Experimental results demonstrate that our framework not only achieves state-of-the-art accuracy across the DDD17-Seg, DSEC-Semantic, and M3ED-Semantic datasets but also significantly reduces energy consumption, achieving a 65% reduction on the DSEC-Semantic dataset.

Efficient Event-Based Semantic Segmentation via Exploiting Frame-Event Fusion: A Hybrid Neural Network Approach

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information