S3Net: Spatiotemporally Separated Sparse Network for Neuromorphic Vision Processing

Ping He; Rong Xiao; Wanying Xu; Chenwei Tang; Shudong Huang; Huajin Tang

doi:10.1609/aaai.v40i6.42467

Authors

Ping He Sichuan University
Rong Xiao Sichuan University
Wanying Xu Sichuan University
Chenwei Tang Sichuan University
Shudong Huang Sichuan University
Huajin Tang Zhejiang University

DOI:

https://doi.org/10.1609/aaai.v40i6.42467

Abstract

Dynamic Vision Sensor (DVS) asynchronously records sparse events triggered by changes in pixel intensity, offering high temporal resolution and low latency. Existing frame-based methods process event data densely, violating its inherent sparsity and introducing computational redundancy. While asynchronous models preserve the event stream's native format, they often neglect spatial information, compromising their adaptability and efficiency. To address these limitations, we propose a Spatiotemporally Separated Sparse Network (S3Net) for efficient event stream encoding and learning. Specifically, we employ a learnable sparse encoding scheme to construct a voxel-structured representation that effectively extracts spatiotemporal relationships among event data. After that, we propose a dual-branch architecture to capture localized spatial dependencies and dynamic temporal patterns of event data. By explicitly decoupling spatial and temporal modeling, S3Net enables end-to-end asynchronous processing of variable-length event sequences, achieving both strong representational capacity and high computational efficiency. Experimental results on six event-based datasets demonstrate that S3Net achieves state-of-the-art performance. Compared to frame-based methods, it significantly reduces computational overhead and model complexity, while also outperforming existing asynchronous approaches in inference speed without compromising accuracy. Extensive experiments across six event-based datasets show that S3Net establishes new state-of-the-art performance. Our method reduces computational costs by 35% and model parameters by 27% compared to frame-based approaches, while delivering 1.58× faster inference than existing point-based methods at comparable accuracy levels.

S3Net: Spatiotemporally Separated Sparse Network for Neuromorphic Vision Processing

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information