Datasets
Standard Dataset
Handwriting Sanskrit Character Recognition
- Citation Author(s):
- Submitted by:
- Md. Ismiel Hoss...
- Last updated:
- Sun, 01/19/2025 - 02:44
- DOI:
- 10.21227/mczd-kh23
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
The "Sanskrit Character Dataset" includes 44 classes of handwritten Sanskrit characters, designed to support research in optical character recognition (OCR) and machine learning for ancient languages. Each class represents a unique Sanskrit letter, collected in various handwriting styles to ensure diversity and robustness. For each class, 50 to 80 images are included. To ensure diversity and real-world applicability, the letters were written in various handwriting styles. The dataset is designed to facilitate research in the field of ancient script recognition, particularly focusing on handwriting variability and pattern recognition. To create this dataset, 8 students each handwrote samples for all 44 classes of Sanskrit characters. Afterward, we carefully photographed each image.
To create this dataset, 8 students each handwrote samples for all 44 classes of Sanskrit character. Afterward, we carefully photographed each image. For each class, there are between 50 to 80 images. Our aim is to apply data augmentation and utilize this dataset for optical character recognition (OCR).