Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection

Authors

  • Xinbin Yuan VCIP, School of Computer Science, NKU
  • Zhaohui Zheng VCIP, School of Computer Science, NKU
  • Yuxuan Li VCIP, School of Computer Science, NKU
  • Xialei Liu VCIP, School of Computer Science, NKU
  • Li Liu Academy of Advanced Technology Research of Hunan, Changsha, China
  • Xiang Li VCIP, School of Computer Science, NKU NKIARI, Futian, Shenzhen, China
  • Qibin Hou VCIP, School of Computer Science, NKU NKIARI, Futian, Shenzhen, China
  • Ming-Ming Cheng VCIP, School of Computer Science, NKU NKIARI, Futian, Shenzhen, China

DOI:

https://doi.org/10.1609/aaai.v40i15.38217

Abstract

In this paper, we show that current approaches using large square kernels or transformer-based global modeling aggregate contextual information uniformly across spatial dimensions, leading to feature dilution and localization errors for elongated targets. To mitigate this issue, we propose Strip R-CNN, the first work to systematically explore large strip convolutions for remote sensing object detection. Our key insight is that strip convolutions enable directional feature aggregation along the dominant spatial dimension of slender objects, reducing background interference while preserving essential geometric information. We design two core components: (i) StripNet, a backbone network employing sequential orthogonal large strip convolutions to capture anisotropic spatial patterns, and (ii) Strip Head, which enhances localization precision by incorporating strip convolutions into the detection head. Unlike previous large-kernel approaches that suffer from computational redundancy and isotropic limitations, our method achieves superior performance with remarkable efficiency. Extensive experiments on multiple benchmarks (DOTA, FAIR1M, HRSC2016, and DIOR) demonstrate significant improvements, with our 30M parameter model achieving 82.75% mAP on DOTA-v1.0, establishing a new state-of-the-art record while providing new insights into anisotropic feature learning for remote sensing applications.

Downloads

Published

2026-03-14

How to Cite

Yuan, X., Zheng, Z., Li, Y., Liu, X., Liu, L., Li, X., Hou, Q., & Cheng, M.-M. (2026). Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 40(15), 12259-12267. https://doi.org/10.1609/aaai.v40i15.38217

Issue

Section

AAAI Technical Track on Computer Vision XII