MODI-HHDoc: Historical MODI Script Handwritten Document Dataset

Citation Author(s):: Manisha Deshmukh (Kavayitri Bahinabai Chaudhari North Maharashtra University, Jalgaon(MS), India)

Satish Kolhe (Kavayitri Bahinabai Chaudhari North Maharashtra University, Jalgaon(MS), India)
Submitted by:: Manisha Deshmukh
Last updated:: Tue, 06/06/2023 - 10:18
DOI:: 10.21227/1z10-w986
Data Format:: *.jpg (zip)
Research Article Link:: A hybrid text line segmentation approach for the ancient handwritten unconstrai…
Links:: The divide-and-conquer based algorithm to detect and correct the skew angle in …

A modified approach for the segmentation of unconstrained cursive Modi touching…

A dynamic statistical nonparametric cleaning and enhancement system for highly …

A New Approach for Unified Characters Cluster Segmentation of Ancient Handwritt…

Unsupervised Page Area Detection Approach for the Unconstrained Chronic Handwri…

A hybrid character segmentation approach for cursive unconstrained handwritten …

824 views

Categories:

Keywords:

Historical Modi Documents

ACCESS DATASET CITE

Abstract

Around from 12th century MODI script was used to write Indian languages as Marathi, Hindi, and Gujarati etc. It was used as administrative script from 17th century to mid of 19th century in Maharashtra state (India). At present, MODI script users are diminishing away, and countable persons can understand the MODI script. The preserved archaic historical MODI handwritten documents contained important and rare cultural, historic, and administrative kind of information which is usable in present-days. The significant information related to current era is preserved in the thousands of the archaic handwritten MODI documents at official and public sectors. MODI-HHDoc Dataset is a collection of three thousand three hundred and fifty handwritten ancient MODI document images. This dataset can be used to develop the handwritten ancient MODI document digitization, recognition, transcription, and transliteration system to gain the information written in MODI script. This dataset is collected in such way that the system should be robust enough to adapt all the variations in approach.

Instructions:

MODI-HHDoc dataset is archived in a zip file. MODI document images are stored in the twenty four folders named as 1, 2, 3….and the document image files are stored as page1.jpg, page2.jpg …. User can change the folder and image file name as per his/her convenience. Particulars of these folders are described in the description section of the dataset. Researcher can use this dataset for different purposes like to create MODI character annotation, document image preprocessing, segmentation, classification etc.

The users of the MODI-HHDoc Dataset must agree that:

· Use of the data set is restricted to research purpose only.

· No redistribution of the dataset is allowed.

· In any resultant publications of research that uses the dataset, due credits will be provided to one or more following publications:

- Deshmukh M. S., Patil, M. P., & Kolhe S. R. (2018). A hybrid text line segmentation approach for the ancient handwritten unconstrained freestyle Modi script documents. The Imaging Science Journal, 66(7), 433-442.

- Deshmukh M. S., Patil M. P., & Kolhe S. R. (2017). The divide-and-conquer based algorithm to detect and correct the skew angle in the old age historical handwritten Modi Lipi documents. Int J Comput Sci Appl, 14(2), 47-63.

- Deshmukh M. S., & Kolhe S. R. (2021). A modified approach for the segmentation of unconstrained cursive Modi touching characters cluster. In Recent Trends in Image Processing and Pattern Recognition: Third International Conference, RTIP2R 2020, Aurangabad, India, January 3–4, 2020, Revised Selected Papers, Part I 3 (pp. 431-444). Springer Singapore.

- Deshmukh M. S., Patil M. P., & Kolhe S. R. (2017, September). A dynamic statistical nonparametric cleaning and enhancement system for highly degraded ancient handwritten Modi Lipi documents. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 1545-1551). IEEE.

- Deshmukh M. S., & Kolhe S. R. (2022). A New Approach for Unified Characters Cluster Segmentation of Ancient Handwritten Modi Documents. In Computer Vision and Robotics: Proceedings of CVR 2021 (pp. 511-526). Singapore: Springer Singapore.

- Deshmukh M. S., & Kolhe S. R. (2021, March). Unsupervised Page Area Detection Approach for the Unconstrained Chronic Handwritten Modi Document Images. In 2021 International Conference on Emerging Smart Computing and Informatics (ESCI) (pp. 130-135). IEEE.

- Deshmukh M. S, & Kolhe S. R. (2019, February). A hybrid character segmentation approach for cursive unconstrained handwritten historical Modi script documents. In Proceedings of International Conference on Sustainable Computing in Science, Technology and Management (SUSCOM), Amity University Rajasthan, Jaipur-India.