Datasets
Standard Dataset
SYPHAXAR Dataset
- Citation Author(s):
- Submitted by:
- mohamed elleuch
- Last updated:
- Tue, 09/12/2023 - 12:40
- DOI:
- 10.21227/dpsa-q406
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
SYPHAXAR dataset is a dataset for Arabic text detection in the wild. It was collected from Tunisia in “Sfax” city, the second largest Tunisian city after the capital. A total of 3078 images were gathered through manual collection one by one, with each image energizing text detection challenges in nature according to real existing complexity of 15 different routes along with ring roads, intersections and roundabouts. These annotated images consist of more than 31000 objects, each of which is enclosed within a bounding box. The estimated overall distance covered is around 422 kilometers; all the paths mentioned and traveled contain commercial and service activities on both outward and return routes.It's worth noting that one of the notable contributions and challenges of the SYPHAXAR dataset is the inclusion of natural text scripts in the Arabic language, encompassing most of the existing challenges seen in state-of-the-art datasets.
Steps to use this Dataset:
1. Download the “SYPHAXAR.zip” file on your device
2. Extract the “SYPHAXAR.zip” file at a particular location.
3. You will find two folders, the folder name “Images” contains all the images of the dataset and the two folders “Annotations” & “YOLO_Annotations” containing all the annotations (Line-level & Word-level) of the dataset.
Comments
Interesting