Datasets
Standard Dataset
DNLL dataset
- Citation Author(s):
- Submitted by:
- Imene Ouali
- Last updated:
- Fri, 10/06/2023 - 08:24
- DOI:
- 10.21227/64gn-r361
- License:
- Categories:
- Keywords:
Abstract
The Numerical Latin Letters (DNLL) dataset consists of Latin numeric letters organized into 26 distinct letter classes, corresponding to the Latin alphabet. Each class within this dataset encompasses multiple letter forms, resulting in a diverse and extensive collection. These letters vary in color, size, writing style, thickness, background, orientation, luminosity, and other attributes, making the dataset highly comprehensive and rich.
DNLL exclusively includes isolated letters and is divided into three essential files: training, testing, and validation. This division not only facilitates text detection and recognition tasks but also ensures robust and accurate results. The dataset is distributed as follows: the training set comprises 80% of the total images, while the remaining images are split between the testing set (80% of the remaining images) and the validation set (20% of the remaining images).
During the processing stage, the images and data undergo enhancement and augmentation procedures to further enrich the dataset and optimize its quality.
This database can help researchers to evaluate their work by running their models on this database to have the accuracy of their latin text detection and recognition method.
Dataset Files
- Evaluation File evaluation.zip (5.68 MB)
- Test File test.zip (14.49 MB)
- Train File train.zip (70.89 MB)