Kadavakollu Rao

Datasets & Competitions

OCR Telugu Image Dataset

The choice of the dataset is the key for OCR systems. Unfortunately, there are very few works on Telugu character datasets. The work by Pramod et al has 500 words and an average of 50 images with 50 fonts in four styles for training data each image of size 48x48 per category. They used the most frequently occurring words in Telugu but were unable to cover all the words in Telugu. Later works were based on character level. The dataset by Hastie has 460 classes and 160 samples per class which is made up of 500 images.

Categories:: Machine Learning

904 Views