Datasets
Standard Dataset
Urdu Handwritten Ligature Dataset
- Citation Author(s):
- Submitted by:
- AEJAZ GANAI
- Last updated:
- Mon, 11/04/2024 - 14:34
- DOI:
- 10.21227/437m-mk79
- Data Format:
- Research Article Link:
- Links:
- License:
- Categories:
- Keywords:
Abstract
Urdu Handwritten Ligature Dataset (UHLD) is the first unconstrained handwritten Urdu dataset developed for various handwritten Urdu recognition tasks and OCR research problems. The UHLD is written independently of paper color, paper type (blank or ruled), ink color, and pen type. The UHLD consists of around six thousand handwritten Urdu text lines written by 200 different writers. The UHLD dataset covers six and seven-character ligatures whereas it was only up to five character ligatures in previous dataset such as UNHD. The dataset is written by male as well as female writers of different age groups. A large portion of the dataset is uploaded here. However, the entire dataset can be made available from the author upon a reasonable request.
Step-1: Collecting your dataset: we have provided the UHLD dataset in CSV format as a collection of thousands of ligature images of length one to seven characers. You can use these ligature images if you are using a holistic approach of handwritten Urdu text recognition.
Step-2: Pre-processing of the images: We have performed pre-processing of the dataset for a substantial dataset of images. However a researcher can further pre-process the ligature images depending upon his/her requirements.
Step-3: Model training: using this dataset, you can train any deep learning model like Convolutional Neural Network, Vision Transformer based BERT model.
Step-4: Model evaluation: The dataset can be used for evaluation of any model for recognition of any Urdu text.
Dataset Files
- Urdu Handwritten ligature Dataset in pdf/jpg format UHLD SAMPLES.zip (72.91 MB)
- Urdu Handwritten ligature Dataset in csv format UHLD in CSV format.zip (149.59 MB)
- UHLD dataset for training and recognition using 2 layer CNN 2layerCNN.py (6.23 kB)
Documentation
Attachment | Size |
---|---|
The proposed dataset UHLD documentation and collection information | 2.02 MB |