Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection

Xinbin Yuan; Zhaohui Zheng; Yuxuan Li; Xialei Liu; Li Liu; Xiang Li; Qibin Hou; Ming-Ming Cheng

doi:10.1609/aaai.v40i15.38217

Authors

Xinbin Yuan VCIP, School of Computer Science, NKU
Zhaohui Zheng VCIP, School of Computer Science, NKU
Yuxuan Li VCIP, School of Computer Science, NKU
Xialei Liu VCIP, School of Computer Science, NKU
Li Liu Academy of Advanced Technology Research of Hunan, Changsha, China
Xiang Li VCIP, School of Computer Science, NKU NKIARI, Futian, Shenzhen, China
Qibin Hou VCIP, School of Computer Science, NKU NKIARI, Futian, Shenzhen, China
Ming-Ming Cheng VCIP, School of Computer Science, NKU NKIARI, Futian, Shenzhen, China

DOI:

https://doi.org/10.1609/aaai.v40i15.38217

Abstract

In this paper, we show that current approaches using large square kernels or transformer-based global modeling aggregate contextual information uniformly across spatial dimensions, leading to feature dilution and localization errors for elongated targets. To mitigate this issue, we propose Strip R-CNN, the first work to systematically explore large strip convolutions for remote sensing object detection. Our key insight is that strip convolutions enable directional feature aggregation along the dominant spatial dimension of slender objects, reducing background interference while preserving essential geometric information. We design two core components: (i) StripNet, a backbone network employing sequential orthogonal large strip convolutions to capture anisotropic spatial patterns, and (ii) Strip Head, which enhances localization precision by incorporating strip convolutions into the detection head. Unlike previous large-kernel approaches that suffer from computational redundancy and isotropic limitations, our method achieves superior performance with remarkable efficiency. Extensive experiments on multiple benchmarks (DOTA, FAIR1M, HRSC2016, and DIOR) demonstrate significant improvements, with our 30M parameter model achieving 82.75% mAP on DOTA-v1.0, establishing a new state-of-the-art record while providing new insights into anisotropic feature learning for remote sensing applications.

Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information