UAV-ITD: UAV Image Text Detection Dataset

Citation Author(s):
Aishwarya
Soni
Tanima
Dutta
Submitted by:
Aishwarya Soni
Last updated:
Mon, 01/01/2024 - 03:13
DOI:
10.21227/172w-5r42
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

 

Unmanned Aerial Vehicle Image Text Detection, UAV-ITD, dataset focuses on the collection and creation of a comprehensive dataset tailored for identifying text in images captured by Unmanned Aerial Vehicles (UAVs). Despite the abundance of natural text datasets, there is a significant dearth of specialised datasets for text detection in UAV-captured imagery. The absence of such datasets impedes the progress of deep learning algorithms in effectively understanding and interpreting textual information within aerial images. This study aims to address this gap by meticulously curating a dataset that captures the diverse challenges posed by UAV imagery, thereby facilitating advancements in text detection algorithms for aerial applications.

Instructions: 

UAV-ITD: UAV Image Text Detection Dataset

 

Unmanned Aerial Vehicle Image Text Detection, UAV-ITD, dataset focuses on the collection and creation of a comprehensive dataset tailored for identifying text in images captured by Unmanned Aerial Vehicles (UAVs). Despite the abundance of natural text datasets, there is a significant dearth of specialised datasets for text detection in UAV-captured imagery. The absence of such datasets impedes the progress of deep learning algorithms in effectively understanding and interpreting textual information within aerial images. This study aims to address this gap by meticulously curating a dataset that captures the diverse challenges posed by UAV imagery, thereby facilitating advancements in text detection algorithms for aerial applications.

 

Introduction:

Unmanned Aerial Vehicles (UAVs) have become indispensable tools for a wide range of applications, including surveillance, disaster response, and environmental monitoring. These UAVs capture vast amounts of visual data, often containing crucial textual information. However, the existing text detection datasets predominantly focus on natural scenes and fail to adequately represent the challenges posed by aerial imagery. To unlock the full potential of UAV-captured data, it is imperative to develop a specialised dataset that mirrors the unique characteristics and complexities associated with identifying text in such images.

 

Purpose:

The primary objective of this research is to develop a dataset that serves as a benchmark for training and evaluating text detection models specifically designed for UAV imagery. By creating a dataset that encapsulates the distinct challenges of aerial scenes, we aim to foster the development of robust algorithms capable of accurately extracting and understanding textual information from UAV-captured images. This dataset will facilitate the testing and validation of existing text detection methods while also providing a foundation for the creation of novel approaches tailored to the unique characteristics of aerial data.

 

Description:

The dataset will encompass a diverse range of UAV-captured images, covering various terrains, lighting conditions, and text styles. The dataset encompasses horizontal , vertical, curve , arbitrary shape text. The text in the dataset includes building name, road hoarding,  billboards etc. Emphasis will be placed on including scenarios with challenging factors such as oblique text orientation, variable font sizes, and text occlusions by environmental elements.  The dataset will be annotated with a precise boundary around each text instance in word-level granularity. This dataset contains a total 1000 images in which training and testing images are 700 and 300 respectively. The images in the dataset were collected using google, captured using camera fitted drones, and various datasets i.e, Visdrone dataset. We use the Roboflow platform for the annotations and utilising the polygon annotations for the text instance boundary. Images in the dataset having resolution vary between 960x540 to 2000x1500. After the dataset collection, we preprocess the images by filtering those images which are blurred, low quality, etc. We use polygon shapes to bind ground truth words tightly. Apart from that, we also included rectangular bounding box annotation for the horizontal and vertical text considering most of the current algorithms generate rectangular bounding box outputs. Our dataset considers the multilingual text. We also perform the augmentation in the images such as 50% probability of horizontal flip followed by clockwise rotation. The format of image and their annotation file is *.jpg and *.txt respectively.  

 

 

Importance:

 

The scarcity of datasets catering specifically to text recognition in UAV imagery hampers the advancement of machine learning models in this domain. A specialised dataset will play a pivotal role in fostering innovation and progress in developing algorithms capable of extracting valuable information from UAV-captured data. The implications are vast, ranging from improved situational awareness in military and surveillance applications to enhanced disaster response capabilities and efficient agricultural monitoring. This dataset aims to bridge the existing gap and pave the way for more accurate and reliable text recognition solutions in the dynamic realm of aerial imagery.

Description Instructions:

 

The dataset comprises a diverse collection of UAV-captured images, featuring a wide range of terrains, lighting conditions, and text styles. The dataset includes horizontal, vertical, curved, and arbitrarily shaped text, encompassing building names, road hoardings, billboards, and more. Special attention is given to challenging scenarios, incorporating factors such as oblique text orientation, variable font sizes, and text occlusions by environmental elements.

 

Annotations:

- Each image is annotated with precise boundaries around individual text instances, with annotations provided at the word-level granularity.

- The dataset consists of a total of 1000 images, with 700 designated for training and 300 for testing.

- Image resolutions vary from 960x540 to 2000x1500 pixels.

 

Data Collection:

- Images were gathered using Google, captured through camera-fitted drones, and augmented with datasets such as the Visdrone dataset.

- Annotation was performed using the Roboflow platform, utilising polygon annotations for text instance boundaries.

 

Preprocessing:

- Preprocessing involves filtering out blurred, low-quality, and other undesirable images.

- Ground truth words are tightly bound using polygon shapes.

- Rectangular bounding box annotations are included for horizontal and vertical text, accommodating algorithms that generate rectangular outputs.

 

Multilingual Consideration:

- The dataset encompasses multilingual text, offering diversity in linguistic content.

 

Augmentation:

- Images undergo augmentation, including a 50% probability of horizontal flips followed by clockwise rotation.

 

File Format:

- Image files are in JPEG format (*.jpg*), while annotation files follow a *.txt* format.

 

These instructions provide an overview of the dataset's composition, annotation methodologies, collection sources, preprocessing steps, and considerations for multilingual content and augmentation.

 

This file presents a comprehensive dataset for advancing the field of computer vision, specifically in the domain of text detection using Unmanned Aerial Vehicles (UAVs). The dataset, comprising over 1013 high-resolution images. These images capture street views from low altitudes, offering a unique perspective that includes entire streets, buildings and prominent billboards. The dataset has been annotated with bounding boxes (polygonal) using Roboflow, providing a robust ground truth for training and evaluation purposes.