Urdu Nastalique

Urdu Nastalique

Citation Author(s):
Khawaja Ubaid
ur Rehman
Submitted by:
Ubaid Rehman
Last updated:
Wed, 02/27/2019 - 04:58
Dataset Views:
0 ratings - Please login to submit your rating.
Share / Embed Cite



The performance of most of the classification models is dependent on the data used for training. The data must be reliable, robust and meticulously labelled. In order to form such a data a systematical approach has been designed and moreover, it should be. The data set was collected from a well-known source, namely Center for Language Engineering available at http://www.cle.org.pk. The corpus available on the website used for prediction contains Urdu Naskh data having 4,325 number of lines and 1, 22284 words. This corpus contains three text files. The mentioned corpus is converted into Jameel Noori Urdu Nastalique font style having 4,325 number of lines and 1, 22284 words. Due to context sensitive nature of Urdu Nastalique it poses several challenges. The mentioned corpus text is converted into images because in OCR systems ligature segmentation and line segmentation of images is itself a challenging task.


1. Extract Urdu Nastalique (All Images) in a folder.2. Extract Urdu Nastalique (AllSets) in the same folder. This folder will contain seven different sets ligature classes. Each set contains different ligature classes samples. 3. Click on Select Image. Now select any image from Urdu Nastalique (All Images) folder. These images can be used for training and testing. 4. Each ligature class contains 15 samples. We did this for uniformity, better recognition and inorder to distinguish one ligtaure from another. 5. You can use these ligature classes for output classes prediction. 6. Each Set contains 161 ligature classes except the last set i.e. Set 7. Set 1 to 6 contains 161 ligature classes. 7. The 161 class contains other classes ligatures samples. The 161 class contains 4 samples of other ligature classes.

Dataset Files

You must be an IEEE Dataport Subscriber to access these files. Login or subscribe now. Sign up to be a Beta Tester and receive a coupon code for a free subscription to IEEE DataPort!

Thank you for rating this dataset!

Please share additional details of your rating with the IEEE DataPort community by adding a comment.

Embed this dataset on another website

Copy and paste the HTML code below to embed your dataset:

Share via email or social media

Click the buttons below:

[1] Khawaja Ubaid ur Rehman, "Urdu Nastalique", IEEE Dataport, 2019. [Online]. Available: http://dx.doi.org/10.21227/xwy2-d592. Accessed: Mar. 31, 2020.
doi = {10.21227/xwy2-d592},
url = {http://dx.doi.org/10.21227/xwy2-d592},
author = {Khawaja Ubaid ur Rehman },
publisher = {IEEE Dataport},
title = {Urdu Nastalique},
year = {2019} }
T1 - Urdu Nastalique
AU - Khawaja Ubaid ur Rehman
PY - 2019
PB - IEEE Dataport
UR - 10.21227/xwy2-d592
ER -
Khawaja Ubaid ur Rehman. (2019). Urdu Nastalique. IEEE Dataport. http://dx.doi.org/10.21227/xwy2-d592
Khawaja Ubaid ur Rehman, 2019. Urdu Nastalique. Available at: http://dx.doi.org/10.21227/xwy2-d592.
Khawaja Ubaid ur Rehman. (2019). "Urdu Nastalique." Web.
1. Khawaja Ubaid ur Rehman. Urdu Nastalique [Internet]. IEEE Dataport; 2019. Available from : http://dx.doi.org/10.21227/xwy2-d592
Khawaja Ubaid ur Rehman. "Urdu Nastalique." doi: 10.21227/xwy2-d592