MODI-HChar: Historical MODI Script Handwritten Character Dataset

Kavayitri Bahinabai Chaudhari North Maharashtra University, Jalgaon(MS), India
Manisha Deshmukh
Tue, 06/06/2023 - 05:41
MODI script was used to write Indian languages as Marathi, Hindi, and Gujarati etc. from 12th century. From 17th century to mid of 19th century MODI was used as administrative script in Maharashtra state (India). Now a days, MODI script users are diminishing away, and countable persons can understand the MODI script. The archaic historical MODI handwritten documents contained important and rare cultural, historic, and administrative type of information which is usable in current era. In the research to train and test the Machine learning system a standard invariant character dataset is required. It is desirable in the development of the character recognition system that proposed approach has generalization proficiencies.  The system gives good results if it is trained and tested using a standard invariant dataset. Here a standard invariant dataset of handwritten MODI characters is uploaded.  MODI-HChar dataset contains total 57 handwritten MODI character classes images which comprises 10 numerals (0-9), 12 vowels (A – Ah) and 35 consonants (K - Dyn). This dataset includes total 575920 MODI character images as 101100 MODI digit images, 121320 MODI vowel images and 353500 MODI consonant images.


This dataset is archived in a zip file. MODI-HChar dataset consists of three main folders as digits, vowels and consonants. Digits folder contains the subfolder for each digit zero to nine. Each of these folders includes 10110 images of the associated MODI digit. Equally vowel folder contains 12 subfolders and consonants folder contains 35 subfolders. And each of these subfolders contains 10110 images of the associated MODI character. The MODI character size is of 170x170 pixels and of 96 dpi. All the images are gray level and having type of the image is JPG.  

The users of the MODI-HHDoc Dataset must agree that:

·       Use of the data set is restricted to research purpose only.

·       No redistribution of the dataset is allowed.

·       Dataset can be partitioned into training and testing as per the requirement.

·       In any resultant publications of research that uses the dataset, due credits will be provided to the following publication:

-       Deshmukh, M. S., Patil, M. P., & Kolhe, S. R. (2015, August). Off-line Handwritten Modi Numerals Recognition using Chain Code. In Proceedings of the Third International Symposium on Women in Computing and Informatics (pp. 388-393).