*.jpg; *.png; *.tif; *.ai; *.opj;

GeoCaption-12K

The scarcity of multimodal datasets in remote sensing, particularly those combining high-resolution imagery with descriptive textual annotations, limits advancements in context-aware analysis. To address this, we introduce a novel dataset comprising 12,473 aerial and satellite images sourced from established benchmarks (RSSCN7, DLRSD, iSAID, LoveDA, and WHU), enriched with automatically generated pseudo-captions and semantic tags.

Categories:

Geoscience and Remote Sensing
Artificial Intelligence
Image Processing
Computer Vision