Kannada Language Image Dataset

Citation Author(s):: Kusumika Krori Dutta

Sunny Arokia Swamy B
Submitted by:: Sunny Arokia Swamy Bellary
Last updated:: Thu, 07/04/2024 - 00:45
DOI:: 10.21227/26sm-ec30
Data Format:: *.jpg

297 views

Categories:

Keywords:

Image dataset

Kannada Language

MNIST

ACCESS DATASET CITE

Abstract

One of the Dravidian language spoken majorly by 60 million people in and around Karnataka state of India is known as Kannada. It is one among 22 scheduled languages of India. Kannada langauge is written in Kannada scriptwhich has its traces back from kadamba script (325-550 AD). There are many languages which were used centuries back and aren’t being used currently whereas Kannada is one such language which is used even today for writing official documents and are being taught at schools which means it is going to be for many years.

Students of the Center for Artificial Intelligence from M S Ramaiah Institute of Technology, Bangalore along with a faculty member made students to write by hand. Students were provided a A4 paper with grids of equal spacing (5 rows, 10 columns) drawn over it. Each student wrote every alphabets 10 time and 72 students were involved. Sample student submission copy is shown in below figure. Students were asked to write in every possible direction, orientation and various strokes. The plan was to scan in same pattern and have the same resolution but because of COVID-19 before the student could submit they had to go back to their homes. So every student submission was different and we had to perform lot of preprocessing.

Kannada script has 13 vowels, 34 consonants, 2 other symbols and 10 numericals. Vinay Prabhu and team have worked on Kannada numbers by developing a novel image dataset for 10 numericals . For more info please refer the following url. Our focus is towards developing image database for alphabets.

The dataset is divided into two subparts: vowels and consonants. There are total of 10 vowels and each having minimum of 100 images with deviation of 50 images. Similarly 21 consonants has minimum of 200 images with deviation of 50 images.

Instructions:

These images are raw images and not been processed.

I too am interested in developing Digital Image processing model for development of Kannada Optical Character Reader. Please grant me access to your dataset. I can't find images uploaded of above stated dataset.

Prakhar Agrawal Fri, 06/14/2024 - 06:57 Permalink