OCR (Optical Character Recognition)

OCR Telugu Image Dataset

The choice of the dataset is the key for OCR systems. Unfortunately, there are very few works on Telugu character datasets. The work by Pramod et al has 500 words and an average of 50 images with 50 fonts in four styles for training data each image of size 48x48 per category. They used the most frequently occurring words in Telugu but were unable to cover all the words in Telugu. Later works were based on character level. The dataset by Hastie has 460 classes and 160 samples per class which is made up of 500 images.

Categories:

Machine Learning

Handwritten Devanagari Characters Dataset –(Vowels, Consonants and Numerals) of 44,000 images for Devanagari CAPTCHA Generation and Recognition.

Devanagari is a phonetic script that originated from Ancient Brahmi. It is the foundation of various Indian languages. According to data from the year 2022, the Devanagari Hindi script is spoken by over 342 million people worldwide and ranks third among the top 45 languages. There are approximately 11 vowels and 33 consonants and 10 numerals in the Devanagari script. The Devanagari script has no upper-or lower-case letters and is written from left to right.

Categories:

Subscribe to OCR (Optical Character Recognition)