Skip to main content

Datasets

Standard Dataset

Handwriting Sanskrit Character Recognition

Citation Author(s):
Md. Ismiel Hossen Abir
Md Jahid
Naznin Sultana Liza
Submitted by:
Md. Ismiel Hossen Abir
Last updated:
DOI:
10.21227/mczd-kh23
Data Format:
No Ratings Yet

Abstract

The "Sanskrit Character Dataset" includes 44 classes of handwritten Sanskrit characters, designed to support research in optical character recognition (OCR) and machine learning for ancient languages. Each class represents a unique Sanskrit letter, collected in various handwriting styles to ensure diversity and robustness. For each class, 50 to 80 images are included. To ensure diversity and real-world applicability, the letters were written in various handwriting styles. The dataset is designed to facilitate research in the field of ancient script recognition, particularly focusing on handwriting variability and pattern recognition. To create this dataset, 8 students each handwrote samples for all 44 classes of Sanskrit characters. Afterward, we carefully photographed each image.

Instructions:

To create this dataset, 8 students each handwrote samples for all 44 classes of Sanskrit character. Afterward, we carefully photographed each image. For each class, there are between 50 to 80 images. Our aim is to apply data augmentation and utilize this dataset for optical character recognition (OCR).