Remote sensing cross-modal image-text retrieval: Key technologies and challenges
- “Remote sensing cross modal text retrieval, as a bridge connecting natural language and remote sensing images, aims to construct efficient bidirectional semantic associations and is a key technology for intelligent analysis of remote sensing data. Experts comprehensively summarized the technological evolution and research status in this field, analyzed in detail the characteristics of mainstream benchmark datasets, introduced a universal evaluation index system, reviewed the technological breakthroughs in text feature representation and remote sensing image feature representation, deeply analyzed the principles and model characteristics of non cross modal pre training and cross modal pre training methods, and revealed the performance advantages of cross modal pre training methods and the data adaptation rules of different fine-tuning strategies through experimental comparison. At the same time, the challenges faced by current research were summarized, and future research directions were discussed, laying the foundation for promoting the in-depth development of remote sensing cross modal image text retrieval technology in practical applications.”
- Vol. 30, Issue 2, Pages: 262-278(2026)
Received:16 October 2025,
Published:07 February 2026
DOI: 10.11834/jrs.20255437
移动端阅览
