Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection
DOI:
https://doi.org/10.1609/aaai.v40i15.38217Abstract
In this paper, we show that current approaches using large square kernels or transformer-based global modeling aggregate contextual information uniformly across spatial dimensions, leading to feature dilution and localization errors for elongated targets. To mitigate this issue, we propose Strip R-CNN, the first work to systematically explore large strip convolutions for remote sensing object detection. Our key insight is that strip convolutions enable directional feature aggregation along the dominant spatial dimension of slender objects, reducing background interference while preserving essential geometric information. We design two core components: (i) StripNet, a backbone network employing sequential orthogonal large strip convolutions to capture anisotropic spatial patterns, and (ii) Strip Head, which enhances localization precision by incorporating strip convolutions into the detection head. Unlike previous large-kernel approaches that suffer from computational redundancy and isotropic limitations, our method achieves superior performance with remarkable efficiency. Extensive experiments on multiple benchmarks (DOTA, FAIR1M, HRSC2016, and DIOR) demonstrate significant improvements, with our 30M parameter model achieving 82.75% mAP on DOTA-v1.0, establishing a new state-of-the-art record while providing new insights into anisotropic feature learning for remote sensing applications.Published
2026-03-14
How to Cite
Yuan, X., Zheng, Z., Li, Y., Liu, X., Liu, L., Li, X., Hou, Q., & Cheng, M.-M. (2026). Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 40(15), 12259-12267. https://doi.org/10.1609/aaai.v40i15.38217
Issue
Section
AAAI Technical Track on Computer Vision XII