DOTA

Name: DOTA
Creator: Jing Gu
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Remote Sensing

Citation Author(s):: Gui Song Xia

Xiang Bai

Jian Ding

Zhen Zhu
Submitted by:: Jing Gu
Last updated:: Sat, 04/12/2025 - 13:16
DOI:: 10.21227/wwrj-3d46
Research Article Link:: DOTA: A Large-scale Dataset for Object Detection in Aerial Images

46 views

Categories:

Remote Sensing

Keywords:

artificial intelligence; deep learning; remote sensing;

ACCESS DATASET CITE

Abstract

Object detection is an important and challenging problem in computer vision. Although the past decade has witnessed major advances in object detection in natural scenes, such successes have been slow to aerial imagery, not only because of the huge variation in the scale, orientation and shape of the object instances on the earths surface, but also due to the scarcity of wellannotated datasets of objects in aerial scenes. To advance object detection research in Earth Vision, also known as Earth Observation and Remote Sensing, we introduce a large-scale Dataset for Object deTection in Aerial images (DOTA). To this end, we collect 2806 aerial images from di erent sensors and platforms. Each image is of the size about 4000×4000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. These DOTA images are then annotated by experts in aerial image interpretation using 15 common object categories. The fully annotated DOTA images contains 188282 instances, each of which is labeled by an arbitrary (8 d.o.f.) quadrilateral. To build a baseline for object detection in Earth Vision, we evaluate state-of-the-art object detection algorithms on DOTA. Experiments demonstrate that DOTA well represents real Earth Vision applications and are quite challenging

Instructions:

The DOTA (Dataset for Object deTection in Aerial images) dataset is a large-scale benchmark for oriented object detection in aerial imagery, comprising 2,806 high-resolution images (train/val/test splits) with 188,282 annotated instances across 15 categories (e.g., planes, ships, vehicles). Each image is typically cropped into sub-images (e.g., 1024×1024) for model processing. Annotations are provided as .txt files in the labelTxt/ folder, where each line defines an object instance with four vertex coordinates (x1,y1 to x4,y4) in clockwise order, category name, and difficulty level. Researchers must preprocess images into patches and convert OBB coordinates to required formats (e.g., YOLO-OBB or MMDetection-compatible formats). Official evaluation protocols require using the DOTA evaluation toolkit for fair comparison. Note: Test set labels are withheld for benchmarking; users should adhere to the predefined splits and avoid mixing background/no-object patches during training.