Vision-MoR: Scaling Vision Transformer via Patch-Level Mixture-of-Recursions

Yunhong He; Zhengqing Yuan; Weixiang Sun; Yiyang Li; Yixin Liu; Yanfang Ye; Lichao Sun

doi:10.1609/aaai.v40i6.42471

Authors

Yunhong He Independent Researcher
Zhengqing Yuan University of Notre Dame
Weixiang Sun University of Notre Dame
Yiyang Li University of Notre Dame
Yixin Liu Lehigh University
Yanfang Ye University of Notre Dame
Lichao Sun Lehigh University

DOI:

https://doi.org/10.1609/aaai.v40i6.42471

Abstract

Scaling Vision Transformers (ViTs) has yielded remarkable advancements in diverse vision tasks, albeit at the cost of escalating computational, memory, and parameter demands. Existing efficiency techniques typically address only one dimension, computation, memory, or parameters, lacking a cohesive approach. In this paper, we introduce Vision-MoR, a novel ViT architecture that unifies parameter sharing, spatially adaptive computation, and memory-efficient design into a single framework. Vision-MoR employs a spatial-aware router with shifted-window attention to dynamically assign per-patch recursion depths, coupled with a recursive Transformer loop enabling token-wise early exiting. This facilitates content-adaptive processing and recursive parameter reuse while preserving spatial locality. On ImageNet-1K, Vision-MoR Small attains 74.6% Top-1 accuracy with 140M FLOPs and 5.7M parameters, outperforming EfficientViT-M2 (70.8%) and SHViT-S1 (72.8%) at superior throughput. The Vision-MoR X-Large variant achieves 80.4% Top-1 and 95.2% Top-5 accuracy using 14.3M parameters and 2044M FLOPs, surpassing ResNet-50 and EfficientNet-B1. On COCO object detection, Vision-MoR X-Large yields 39.1 AP with the lowest latency among comparable models. These results underscore Vision-MoR's state-of-the-art accuracy-efficiency trade-offs, positioning it as a scalable, deployment-friendly backbone for real-time vision applications.

Vision-MoR: Scaling Vision Transformer via Patch-Level Mixture-of-Recursions

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information