Cumulant Attention in Vision Transformers (Student Abstract)

Authors

  • Yuto Morimoto Graduate School of Informatics, Nagoya University
  • Zhipeng Wang Graduate School of Informatics, Nagoya University
  • Koji Yasuda Graduate School of Informatics, Nagoya University Institute of Materials and Systems for Sustainability, Nagoya University

DOI:

https://doi.org/10.1609/aaai.v40i48.42259

Abstract

Transformer models have achieved remarkable success across diverse deep learning fields, including natural language processing (NLP) and computer vision (CV). One drawback of these models is that the computational cost of the softmax attention, the core component of the transformer, exhibits quadratic complexity in both time and memory. As data scales up various attempts have been reported to overcome this bottleneck. The objective of this study is to propose a novel attention mechanism, "Cumulant Attention", that systematically balances efficiency and accuracy. This proposal introduces a statistical-mechanics perspective and a reliable approximation based on cumulant expansion into the attention layer. The low-order variant reduces computational complexity to linear order, similar to the linear attention, while keeping nonlinearity of the softmax attention. We evaluate several variants on CV tasks, including image classification with ViT on ImageNet-100 and video classification with ViViT on UCF-101. Experimental results demonstrate that the cumulant attention outperforms the linear attention and achieves accuracy comparable to the softmax attention. These findings validate the effectiveness of our approach and highlight future directions, including scaling to larger models, extending to other modalities, and optimizing implementations for GPU hardware.

Published

2026-03-14

How to Cite

Morimoto, Y., Wang, Z., & Yasuda, K. (2026). Cumulant Attention in Vision Transformers (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41331–41333. https://doi.org/10.1609/aaai.v40i48.42259