Urdu Nastalique

Name: Urdu Nastalique
Creator: Ubaid Rehman
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Other

Citation Author(s):: Khawaja Ubaid ur Rehman
Submitted by:: Ubaid Rehman
Last updated:: Wed, 02/27/2019 - 09:58
DOI:: 10.21227/xwy2-d592

177 views

Categories:

Other

ACCESS DATASET CITE

Abstract

The performance of most of the classification models is dependent on the data used for training. The data must be reliable, robust and meticulously labelled. In order to form such a data a systematical approach has been designed and moreover, it should be. The data set was collected from a well-known source, namely Center for Language Engineering available at http://www.cle.org.pk. The corpus available on the website used for prediction contains Urdu Naskh data having 4,325 number of lines and 1, 22284 words. This corpus contains three text files. The mentioned corpus is converted into Jameel Noori Urdu Nastalique font style having 4,325 number of lines and 1, 22284 words. Due to context sensitive nature of Urdu Nastalique it poses several challenges. The mentioned corpus text is converted into images because in OCR systems ligature segmentation and line segmentation of images is itself a challenging task.

Instructions:

1. Extract Urdu Nastalique (All Images) in a folder.
2. Extract Urdu Nastalique (AllSets) in the same folder. This folder will contain seven different sets ligature classes. Each set contains different ligature classes samples. 3. Click on Select Image. Now select any image from Urdu Nastalique (All Images) folder. These images can be used for training and testing. 4. Each ligature class contains 15 samples. We did this for uniformity, better recognition and inorder to distinguish one ligtaure from another. 5. You can use these ligature classes for output classes prediction. 6. Each Set contains 161 ligature classes except the last set i.e. Set 7. Set 1 to 6 contains 161 ligature classes. 7. The 161 class contains other classes ligatures samples. The 161 class contains 4 samples of other ligature classes.

Datasets

Standard Dataset

Urdu Nastalique

Abstract

Instructions:

Dataset Files

QUESTIONS?

More like this Dataset

List of Indexed Journal: Web of Science, Scopus, and DOAJ

Dataset for classification of handwritten and printed text in a Doctor's prescription

Stock Market Tweets Data

Hotel Reviews from around the world with Sentiment Values and Review Ratings in different Categories for Natural Language Processing

SU-AIS BB-MAS (Syracuse University and Assured Information Security - Behavioral Biometrics Multi-device and multi-Activity data from Same users) Dataset

A Dataset on Online Learning-based Web Behavior from Different Countries Before and After COVID-19