S3Net: Spatiotemporally Separated Sparse Network for Neuromorphic Vision Processing

Authors

  • Ping He Sichuan University
  • Rong Xiao Sichuan University
  • Wanying Xu Sichuan University
  • Chenwei Tang Sichuan University
  • Shudong Huang Sichuan University
  • Huajin Tang Zhejiang University

DOI:

https://doi.org/10.1609/aaai.v40i6.42467

Abstract

Dynamic Vision Sensor (DVS) asynchronously records sparse events triggered by changes in pixel intensity, offering high temporal resolution and low latency. Existing frame-based methods process event data densely, violating its inherent sparsity and introducing computational redundancy. While asynchronous models preserve the event stream's native format, they often neglect spatial information, compromising their adaptability and efficiency. To address these limitations, we propose a Spatiotemporally Separated Sparse Network (S3Net) for efficient event stream encoding and learning. Specifically, we employ a learnable sparse encoding scheme to construct a voxel-structured representation that effectively extracts spatiotemporal relationships among event data. After that, we propose a dual-branch architecture to capture localized spatial dependencies and dynamic temporal patterns of event data. By explicitly decoupling spatial and temporal modeling, S3Net enables end-to-end asynchronous processing of variable-length event sequences, achieving both strong representational capacity and high computational efficiency. Experimental results on six event-based datasets demonstrate that S3Net achieves state-of-the-art performance. Compared to frame-based methods, it significantly reduces computational overhead and model complexity, while also outperforming existing asynchronous approaches in inference speed without compromising accuracy. Extensive experiments across six event-based datasets show that S3Net establishes new state-of-the-art performance. Our method reduces computational costs by 35% and model parameters by 27% compared to frame-based approaches, while delivering 1.58× faster inference than existing point-based methods at comparable accuracy levels.

Published

2026-03-14

How to Cite

He, P., Xiao, R., Xu, W., Tang, C., Huang, S., & Tang, H. (2026). S3Net: Spatiotemporally Separated Sparse Network for Neuromorphic Vision Processing. Proceedings of the AAAI Conference on Artificial Intelligence, 40(6), 4663–4671. https://doi.org/10.1609/aaai.v40i6.42467

Issue

Section

AAAI Technical Track on Computer Vision III