AirWino: Optimized Winograd Convolution for Accelerating CNN Inference on ARMv8 Processors

Authors

  • Haoyuan Gui University of the Chinese Academy of Sciences, Institute of Software, Chinese Academy of Sciences
  • Xiaoyu Zhang University of the Chinese Academy of Sciences, Institute of Software, Chinese Academy of Sciences
  • Yifan Zhang University of the Chinese Academy of Sciences, Institute of Software, Chinese Academy of Sciences
  • Ximeng Fu University of the Chinese Academy of Sciences, Institute of Software, Chinese Academy of Sciences
  • Shiqi Sun University of the Chinese Academy of Sciences, Institute of Software, Chinese Academy of Sciences
  • Leisheng Li Institute of Software, Chinese Academy of Sciences, Key Laboratory of System Software, Institute of Software, Chinese Academy of Sciences
  • Huiyuan Li Institute of Software, Chinese Academy of Sciences, Key Laboratory of System Software, Institute of Software, Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v40i26.39288

Abstract

As Convolutional Neural Networks (CNNs) continue to gain traction in deep learning, Winograd convolution has emerged as a key algorithm to enhance computational efficiency. Although ARM-based CPUs are increasingly prevalent in mobile devices, embedded systems and HPC servers, existing 2D Winograd convolution implementations for ARM often leave room for improvement in transformation efficiency, computational throughput, and overall versatility. Furthermore, the lack of tailored 3D Winograd convolution implementations for ARM architectures stems from the additional complexity of supporting higher-dimensional kernels. AirWino introduces a set of novel optimizations covering transformations, data layouts, micro-kernel computations, and parallelization strategies for both 2D and 3D Winograd convolution. It supports FP32 and FP16 precisions with filter sizes of 3 and 5, targeting a broad range of applications. Evaluations on four distinct ARM platforms show that AirWino consistently outperforms state-of-the-art libraries across various experimental scenarios and hardware configurations, highlighting its efficiency and portability.

Downloads

Published

2026-03-14

How to Cite

Gui, H., Zhang, X., Zhang, Y., Fu, X., Sun, S., Li, L., & Li, H. (2026). AirWino: Optimized Winograd Convolution for Accelerating CNN Inference on ARMv8 Processors. Proceedings of the AAAI Conference on Artificial Intelligence, 40(26), 21414–21422. https://doi.org/10.1609/aaai.v40i26.39288

Issue

Section

AAAI Technical Track on Machine Learning III