Style-Content Metric Learning for Multidomain Remote Sensing Object Recognition

Authors

  • Wenda Zhao Dalian University of Technology
  • Ruikai Yang Dalian University of Technology
  • Yu Liu Tsinghua University
  • You He Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v37i3.25473

Keywords:

CV: Object Detection & Categorization

Abstract

Previous remote sensing recognition approaches predominantly perform well on the training-testing dataset. However, due to large style discrepancies not only among multidomain datasets but also within a single domain, they suffer from obvious performance degradation when applied to unseen domains. In this paper, we propose a style-content metric learning framework to address the generalizable remote sensing object recognition issue. Specifically, we firstly design an inter-class dispersion metric to encourage the model to make decision based on content rather than the style, which is achieved by dispersing predictions generated from the contents of both positive sample and negative sample and the style of input image. Secondly, we propose an intra-class compactness metric to force the model to be less style-biased by compacting classifier's predictions from the content of input image and the styles of positive sample and negative sample. Lastly, we design an intra-class interaction metric to improve model's recognition accuracy by pulling in classifier's predictions obtained from the input image and positive sample. Extensive experiments on four datasets show that our style-content metric learning achieves superior generalization performance against the state-of-the-art competitors. Code and model are available at: https://github.com/wdzhao123/TSCM.

Downloads

Published

2023-06-26

How to Cite

Zhao, W., Yang, R., Liu, Y., & He, Y. (2023). Style-Content Metric Learning for Multidomain Remote Sensing Object Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 37(3), 3624-3632. https://doi.org/10.1609/aaai.v37i3.25473

Issue

Section

AAAI Technical Track on Computer Vision III