Any2RSI: Controllable Remote Sensing Text-to-Image Generation via Any Control and Enriched Description

Xu Zhang; Jianzhong Huang; Lefei Zhang

doi:10.1609/aaai.v40i15.38283

Authors

Xu Zhang National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University
Jianzhong Huang National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University
Lefei Zhang National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University

DOI:

https://doi.org/10.1609/aaai.v40i15.38283

Abstract

Recent advances in controllable text-to-image (T2I) generation have achieved impressive results in natural images, but remote sensing (RS) T2I remains challenging due to the unique nature of geospatial data. Existing methods struggle to integrate diverse spatial controls and model complex spatial relationships, often failing to maintain semantic consistency with typically vague or incomplete textual descriptions. Moreover, limited by small-scale, low-quality datasets, these models produce outputs with inconsistent layouts and unrealistic content. To address these issues, we propose Any2RSI, a flexible framework for controllable RS T2I generation. It features a Cross-Modal Multi-Control Adapter that extracts modality-agnostic embeddings from heterogeneous spatial inputs, enabling precise spatial guidance. To compensate for sparse or ambiguous text prompts, we introduce a VLM-Empowered Enriched Description Generation module that enhances input descriptions with cross-modal semantics for more coherent image generation. Furthermore, we present RST2I-110K, a new large-scale dataset with over 115,000 high-quality RS image-text pairs across diverse scenes, alleviating data scarcity in this domain. Extensive experiments show that Any2RSI achieves state-of-the-art performance on both existing and new datasets, improving the realism and structural accuracy of generated RS imagery.

Any2RSI: Controllable Remote Sensing Text-to-Image Generation via Any Control and Enriched Description

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information