Spike2Former: Efficient Spiking Transformer for High-performance Image Segmentation

Zhenxin Lei; Man Yao; Jiakui Hu; Xinhao Luo; Yanye Lu; Bo Xu; Guoqi Li

doi:10.1609/aaai.v39i2.32126

Authors

Zhenxin Lei University of Chinese Academy of Sciences Institute of Automation, Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Chinese Academy of Sciences
Man Yao Institute of Automation, Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Chinese Academy of Sciences
Jiakui Hu Institute of Medical Technology, Peking University Health Science Center, Peking University Institute of Automation, Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Chinese Academy of Sciences
Xinhao Luo Institute of Automation, Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Chinese Academy of Sciences
Yanye Lu Institute of Medical Technology, Peking University Health Science Center, Peking University National Biomedical Imaging Center, Peking University
Bo Xu Institute of Automation, Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Chinese Academy of Sciences
Guoqi Li Institute of Automation, Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v39i2.32126

Abstract

Spiking Neural Networks (SNNs) have a low-power advantage but perform poorly in image segmentation tasks. The reason is that directly converting neural networks with complex architectural designs for segmentation tasks into spiking versions leads to performance degradation and non-convergence. To address this challenge, we first identify the modules in the architecture design that lead to the severe reduction in spike firing, make targeted improvements, and propose Spike2Former architecture. Second, we propose normalized integer spiking neurons to solve the training stability problem of SNNs with complex architectures. We set a new state-of-the-art for SNNs in various semantic segmentation datasets, with a significant improvement of +12.7% mIoU and 5.0x efficiency on ADE20K, +14.3% mIoU and 5.2x efficiency on VOC2012, and +9.1% mIoU and 6.6x efficiency on CityScapes.

Spike2Former: Efficient Spiking Transformer for High-performance Image Segmentation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information